Radionuclide Release Analysis:¶

The following Data Analysis was performed by Alan Rial for Nuclear Transparency Project, and was finalized on 10/24/2022. The main objective of this exercise is to analyze the quality of open-access radionuclide release datasets, by identifying patterns, trends, unusual values, and missing data to inform regulatory legal submissions related to nuclear transparency and data disclosure for the Canadian Nuclear Safety Commission (CNSC).

4 Categories of facilities to analyze:

  • Nuclear Power Plants
  • Nuclear Processing Facilities
  • Canadian Nuclear Laboratories
  • Uranium Mines and Mills

Open Data Source: https://open.canada.ca/data/en/dataset/6ed50cd9-0d8c-471b-a5f6-26088298870e

About NTP:

The Nuclear Transparency Project (NTP) is a Canadian-registered not-for-profit organization dedicated to supporting open, informed, and equitable public discourse on nuclear technologies. NTP advocates for robust public access to data and other types of information and helps to produce accessible analysis of publicly available information, all with a view to supporting greater transparency in the Canadian nuclear sector. NTP is comprised of a multi-disciplinary group of experts working to examine the economic, ecological, and social facets and impacts of the Canadian nuclear sector. The organization produces public reports, academic articles, and other publicly accessible resources. It also regularly intervenes in nuclear regulatory decision-making processes. The organization seeks to support youth and early career scholars, especially those from underrepresented communities in their respective disciplines. NTP also recognizes a responsibility to model the transparency and accountability practices for which it advocates. We are committed to interdisciplinary, cross-sectoral, and equitable collaborations and dialogue between regulators, industry, civil society, members of host and potential host communities, as well as academics and professionals from science, technology, engineering and math (STEM) fields, the social sciences, and humanities.

For more information, please refer to https://nucleartransparency.ca/


Table of Contents

  • 1  Nuclear Power Plants
    • 1.1  First Look at the Dataframe
    • 1.2  Creating Geographic Reference Table
    • 1.3  Cleaning of the Dataframe
    • 1.4  Plotting by Substance
      • 1.4.1  Estimated public dose
      • 1.4.2  Tritium (HTO)
      • 1.4.3  Carbon-14
      • 1.4.4  Iodine-131
      • 1.4.5  Particulate (gross beta/gamma)
      • 1.4.6  Particulate gross alpha
      • 1.4.7  Elemental Tritium (HT)
      • 1.4.8  Total noble gases
    • 1.5  Individual Plotting by Substance & Facility
    • 1.6  Comparison with 2020 Dataframe
  • 2  Nuclear Processing Facilities:
    • 2.1  First Look at the Dataframe
    • 2.2  Creating Geographic Reference Table
    • 2.3  Cleaning of the Dataframe
    • 2.4  Plotting by Substance
      • 2.4.1  Estimated public dose
      • 2.4.2  Uranium
      • 2.4.3  Radium-226
      • 2.4.4  Elemental Tritium (HT)
      • 2.4.5  Tritium (HTO)
      • 2.4.6  Cobalt-60
      • 2.4.7  Iodine-125
      • 2.4.8  Iodine-125
      • 2.4.9  Xenon-133
      • 2.4.10  Xenon-135
      • 2.4.11  Xenon-135m
    • 2.5  Individual Plotting by Substance & Facility
    • 2.6  Comparison with 2020 Dataframe
  • 3  Canadian Nuclear Laboratories
    • 3.1  First Look at the Dataframe
    • 3.2  Creating Geographic Reference Table
    • 3.3  Cleaning of the Dataframe
    • 3.4  Plotting by Substance
      • 3.4.1  Carbon-14
      • 3.4.2  Estimated public dose
      • 3.4.3  Particulate gross alpha
      • 3.4.4  Particulate gross beta
      • 3.4.5  Radium-226
      • 3.4.6  Strontium-90
      • 3.4.7  Tritium (HTO)
      • 3.4.8  Uranium
      • 3.4.9  Americium-241
      • 3.4.10  Argon-41
      • 3.4.11  Cesium-137
      • 3.4.12  Elemental Tritium (HT)
      • 3.4.13  Iodine-125
      • 3.4.14  Iodine-131
      • 3.4.15  Plutonium-238
      • 3.4.16  Plutonium-239/240
      • 3.4.17  Total noble gases
      • 3.4.18  Xenon-133
    • 3.5  Individual Plotting by Substance & Facility
    • 3.6  Comparison with 2020 Dataframe
  • 4  Uranium Mines and Mills
    • 4.1  First Look at the Dataframe
    • 4.2  Creating Geographic Reference Table
    • 4.3  Cleaning of the Dataframe
    • 4.4  Plotting by Substance
      • 4.4.1  Uranium
      • 4.4.2  Thorium-230
      • 4.4.3  Radium-226
      • 4.4.4  Lead-210
      • 4.4.5  Polonium-210
    • 4.5  Individual Plotting by Substance & Facility
    • 4.6  Comparison with 2020 Dataframe

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

pd.options.mode.chained_assignment = None #disabling the "SettingWithCopyWarning". 

Nuclear Power Plants¶

First Look at the Dataframe¶

In [2]:
df_npp = pd.read_csv("./Datasets/Nuclear Power Plants.csv")
df_npp
Out[2]:
Year | Année NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Région économique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Français) Units | Unités < Stack Emissions | Émissions de cheminées <.1 Direct Discharge | Évacuations directes Footnotes | Notes de bas de page
0 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Tritium (HTO) Tritium (Eau tritiée) Bq NaN 4.43E+13 NaN 1.56E+14 NaN
1 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Carbon-14 Carbone-14 Bq NaN 6.17E+09 NaN 6.07E+07 NaN
2 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Total noble gases Total des gaz nobles Bq-MeV NaN NRM | NRS NaN NRM | NRS NaN
3 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Iodine-131 Iode-131 Bq NaN NRM | NRS NaN NRM | NRS NaN
4 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Particulate (gross beta/gamma) Particules (bêta brutes/gamma brutes) Bq NaN 5.11E+05 NaN 7.11E+07 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
494 2011 3161 Ontario Power Generation Inc. Pickering Nuclear - B Pickering Pickering Toronto Toronto ON 43.8104 -79.0676 Total noble gases Total des gaz nobles Bq-MeV NaN 8.40E+13 NaN NRM | NRS NaN
495 2011 3161 Ontario Power Generation Inc. Pickering Nuclear - B Pickering Pickering Toronto Toronto ON 43.8104 -79.0676 Iodine-131 Iode-131 Bq NaN 8.80E+06 NaN NRM | NRS NaN
496 2011 3161 Ontario Power Generation Inc. Pickering Nuclear - B Pickering Pickering Toronto Toronto ON 43.8104 -79.0676 Particulate (gross beta/gamma) Particules (bêta brutes/gamma brutes) Bq NaN 3.60E+06 NaN 1.40E+10 NaN
497 2011 3161 Ontario Power Generation Inc. Pickering Nuclear - B Pickering Pickering Toronto Toronto ON 43.8104 -79.0676 Particulate gross alpha Particules alpha brutes Bq NaN NRM | NRS NaN 4.80E+07 NaN
498 2011 3163 Ontario Power Generation Inc. Pickering Nuclear - A & B Pickering Pickering Toronto Toronto ON 43.8104 -79.0676 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a NaN 0.0009 NaN NRM | NRS Estimated public dose is calculated incorporat...

499 rows × 19 columns

(1) Why does the Data start in 2011? Can we get older data?

In [3]:
# I'm creating a copy because I will need it later.

df_npp_0 = df_npp.copy()
In [4]:
df_npp["Facility Name | Nom de l'installation"].unique()
Out[4]:
array(['Gentilly-2', 'Point Lepreau Generating Station',
       'Bruce Power - A', 'Bruce Power - B', 'Bruce Power Site',
       'Darlington Nuclear', 'Pickering Nuclear - A & B',
       'Pickering Nuclear - A', 'Pickering Nuclear - B',
       'Bruce Power Site '], dtype=object)

(2) Why is Pickering Nuclear divided as A & B up to 2018, but later combined? Why it is combined for Estimated public dose for every year?

(3) Why is Bruce Power Site combined for Estimated public dose? (but splitted for the rest of the information)

In [5]:
# Renaming columns to English only:

df_npp.rename(columns={'Year | Année': 'Year', 'NPRI ID | ID INRP': 'NPRI ID','Company Name | Raison Sociale':'Company Name',"Facility Name | Nom de l'installation":'Facility Name', 'City | Ville':'City', 'CSD | SDR':'CSD','CA or CMA | AR ou RMR':'CA or CMA', 'Economic Region | Région économique':'Economic Region','Province | Province':'Province', 'Latitude | Latitude':'Latitude', 'Longitude | Longitude':'Longitude','Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)','Units | Unités':'Units', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge','Footnotes | Notes de bas de page':'Footnotes'}, inplace = True)
df_npp.head()
Out[5]:
Year NPRI ID Company Name Facility Name City CSD CA or CMA Economic Region Province Latitude Longitude Substance Name (English) Substance Name (French) | Nom de substance (Français) Units < Stack Emissions <.1 Direct Discharge Footnotes
0 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Tritium (HTO) Tritium (Eau tritiée) Bq NaN 4.43E+13 NaN 1.56E+14 NaN
1 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Carbon-14 Carbone-14 Bq NaN 6.17E+09 NaN 6.07E+07 NaN
2 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Total noble gases Total des gaz nobles Bq-MeV NaN NRM | NRS NaN NRM | NRS NaN
3 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Iodine-131 Iode-131 Bq NaN NRM | NRS NaN NRM | NRS NaN
4 2021 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Particulate (gross beta/gamma) Particules (bêta brutes/gamma brutes) Bq NaN 5.11E+05 NaN 7.11E+07 NaN

I noticed some values are expressed as "LD (Level of Detection) & NRM (Not Required to Monitor)". I will summarize which values are given like that, before replacing them with zeros to be able to plot:

In [6]:
# Stack Emission column first:

df_npp_miss_stack = df_npp[df_npp['Stack Emissions'].isin(['LD', 'NRM | NRS'])]
df_npp_miss_stack[['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
Out[6]:
Year Facility Name Substance Name (English) Stack Emissions
239 2016 Bruce Power - B Iodine-131 LD
484 2011 Darlington Nuclear Particulate gross alpha NRM | NRS
437 2012 Darlington Nuclear Particulate gross alpha NRM | NRS
455 2011 Gentilly-2 Iodine-131 LD
361 2013 Gentilly-2 Iodine-131 LD
314 2014 Gentilly-2 Iodine-131 LD
267 2015 Gentilly-2 Iodine-131 NRM | NRS
220 2016 Gentilly-2 Iodine-131 NRM | NRS
173 2017 Gentilly-2 Iodine-131 NRM | NRS
126 2018 Gentilly-2 Iodine-131 NRM | NRS
85 2019 Gentilly-2 Iodine-131 NRM | NRS
44 2020 Gentilly-2 Iodine-131 NRM | NRS
3 2021 Gentilly-2 Iodine-131 NRM | NRS
266 2015 Gentilly-2 Total noble gases NRM | NRS
219 2016 Gentilly-2 Total noble gases NRM | NRS
172 2017 Gentilly-2 Total noble gases NRM | NRS
125 2018 Gentilly-2 Total noble gases NRM | NRS
84 2019 Gentilly-2 Total noble gases NRM | NRS
43 2020 Gentilly-2 Total noble gases NRM | NRS
2 2021 Gentilly-2 Total noble gases NRM | NRS
491 2011 Pickering Nuclear - A Particulate gross alpha NRM | NRS
444 2012 Pickering Nuclear - A Particulate gross alpha NRM | NRS
497 2011 Pickering Nuclear - B Particulate gross alpha NRM | NRS
450 2012 Pickering Nuclear - B Particulate gross alpha NRM | NRS
461 2011 Point Lepreau Generating Station Iodine-131 NRM | NRS
414 2012 Point Lepreau Generating Station Iodine-131 NRM | NRS
367 2013 Point Lepreau Generating Station Iodine-131 NRM | NRS
320 2014 Point Lepreau Generating Station Iodine-131 NRM | NRS
462 2011 Point Lepreau Generating Station Particulate (gross beta/gamma) NRM | NRS
415 2012 Point Lepreau Generating Station Particulate (gross beta/gamma) NRM | NRS
368 2013 Point Lepreau Generating Station Particulate (gross beta/gamma) NRM | NRS
321 2014 Point Lepreau Generating Station Particulate (gross beta/gamma) NRM | NRS
463 2011 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
416 2012 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
369 2013 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
322 2014 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
275 2015 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
228 2016 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
181 2017 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
134 2018 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
93 2019 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
52 2020 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
11 2021 Point Lepreau Generating Station Particulate gross alpha NRM | NRS
460 2011 Point Lepreau Generating Station Total noble gases NRM | NRS

(4) Summary of Missing Data (LD / NRM) for Stack Emissions:

  • Bruce Power - B only: Iodine-131 2016 (LD).
  • Darlington: Alpha 2011 & 2012 (NRM).
  • Gentilly-2: Iodine-131 2011, 2013, 2014 (LD), & 2015 to 2021 (NRM); Noble Gases from 2015 to 2021 (NRM).
  • Pickering A & B: Alpha 2011 & 2012 (NRM).
  • Point Lepreau: Iodine-131 from 2011 to 2014 (NRM); Beta/gamma from 2011 to 2014 (NRM); Alpha from 2011 to 2021 (NRM); Noble Gases 2011 (NRM).
In [7]:
# Direct Discharge column next:

df_npp_miss_discharge = df_npp[df_npp['Direct Discharge'].isin(['LD', 'NRM | NRS'])]
df_npp_miss_discharge[['Year','Facility Name', 'Substance Name (English)', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
Out[7]:
Year Facility Name Substance Name (English) Direct Discharge
468 2011 Bruce Power - A Iodine-131 NRM | NRS
421 2012 Bruce Power - A Iodine-131 NRM | NRS
374 2013 Bruce Power - A Iodine-131 NRM | NRS
327 2014 Bruce Power - A Iodine-131 NRM | NRS
280 2015 Bruce Power - A Iodine-131 NRM | NRS
... ... ... ... ...
178 2017 Point Lepreau Generating Station Total noble gases NRM | NRS
131 2018 Point Lepreau Generating Station Total noble gases NRM | NRS
90 2019 Point Lepreau Generating Station Total noble gases NRM | NRS
49 2020 Point Lepreau Generating Station Total noble gases NRM | NRS
8 2021 Point Lepreau Generating Station Total noble gases NRM | NRS

239 rows × 4 columns

(5) Summary of Missing Data (LD / NRM) for Direct Discharge:

  • Bruce Power A: Iodine-131 from 2011 to 2021 (NRM); Alpha from 2017 to 2021 (LD); Noble Gases from 2011 to 2021 (NRM).
  • Bruce Power B: Iodine-131 from 2011 to 2021 (NRM); Alpha from 2016 to 2021 (LD); Noble Gases from 2011 to 2021 (NRM).
  • Darlington: Elemental Tritium from 2011 to 2021 (NRM); Iodine-131 from 2011 to 2021 (NRM); Noble Gases from 2011 to 2021 (NRM).
  • Gentilly-2: Iodine-131 from 2011 to 2021 (NRM); Noble Gases from 2011 to 2021 (NRM).
  • Pickering A & B: Iodine-131 from 2011 to 2021 (NRM); Noble Gases from 2011 to 2021 (NRM).
  • Pickering A: Carbon-14 from 2011 to 2018 (NRM) & Alpha from 2011 to 2018 (NRM).
  • Point Lepreau: Noble Gases from 2011 to 2021 (NRM); Iodine-131 from 2011 to 2019 (NRM).

Note 1: Estimated Public Dose is missing for all of the Direct Discharge, as it was reported in Stack Emissions "incorporating all major release pathways (emissions and discharges)" according to the footnote.

Note 2: Noble Gases are missing for all of the Direct Discharge, which makes senses as they are not soluble in water.

Note 3: Only reports of Iodine-131 is Point Lepreau 2020, 2021.

In [8]:
# I noticed one value is "0":

df_npp[(df_npp['Stack Emissions'] == '0.00E+00') | (df_npp['Direct Discharge'] == '0.00E+00')]
Out[8]:
Year NPRI ID Company Name Facility Name City CSD CA or CMA Economic Region Province Latitude Longitude Substance Name (English) Substance Name (French) | Nom de substance (Français) Units < Stack Emissions <.1 Direct Discharge Footnotes
22 2021 7041 Bruce Power LP Bruce Power - B Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3289 -81.5916 Iodine-131 Iode-131 Bq NaN 0.00E+00 NaN NRM | NRS NaN

(5') Why is Bruce Power - B Iodine 131 report for 2021 "0.00e+00"? Seems unusual considering previous values.

Creating Geographic Reference Table¶

In [9]:
# Combining Bruce Power A & B and Pickering Nuclear A & B for a geographic reference table, so I can delete them from the dataframe I will use for plotting:

df_npp_geography = df_npp[['Facility Name', 'NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude']]
df_npp_geography['Facility Name'].replace('Bruce Power - A', 'Bruce Power Site', inplace=True)
df_npp_geography['Facility Name'].replace('Bruce Power - B', 'Bruce Power Site', inplace=True)
df_npp_geography['Facility Name'].replace('Bruce Power Site ','Bruce Power Site',inplace=True)
df_npp_geography['Facility Name'].replace('Pickering Nuclear - A', 'Pickering Nuclear - A & B', inplace=True)
df_npp_geography['Facility Name'].replace('Pickering Nuclear - B', 'Pickering Nuclear - A & B', inplace=True)
df_npp_geography
Out[9]:
Facility Name NPRI ID Company Name City CSD CA or CMA Economic Region Province Latitude Longitude
0 Gentilly-2 1445 Hydro-Québec Bécancour NaN NaN NaN QC 46.3958 -72.3569
1 Gentilly-2 1445 Hydro-Québec Bécancour NaN NaN NaN QC 46.3958 -72.3569
2 Gentilly-2 1445 Hydro-Québec Bécancour NaN NaN NaN QC 46.3958 -72.3569
3 Gentilly-2 1445 Hydro-Québec Bécancour NaN NaN NaN QC 46.3958 -72.3569
4 Gentilly-2 1445 Hydro-Québec Bécancour NaN NaN NaN QC 46.3958 -72.3569
... ... ... ... ... ... ... ... ... ... ...
494 Pickering Nuclear - A & B 3161 Ontario Power Generation Inc. Pickering Pickering Toronto Toronto ON 43.8104 -79.0676
495 Pickering Nuclear - A & B 3161 Ontario Power Generation Inc. Pickering Pickering Toronto Toronto ON 43.8104 -79.0676
496 Pickering Nuclear - A & B 3161 Ontario Power Generation Inc. Pickering Pickering Toronto Toronto ON 43.8104 -79.0676
497 Pickering Nuclear - A & B 3161 Ontario Power Generation Inc. Pickering Pickering Toronto Toronto ON 43.8104 -79.0676
498 Pickering Nuclear - A & B 3163 Ontario Power Generation Inc. Pickering Pickering Toronto Toronto ON 43.8104 -79.0676

499 rows × 10 columns

In [10]:
# Cleaning the geography dataframe:

df_npp_geography.drop_duplicates(inplace=True)
df_npp_geography = df_npp_geography.reset_index(drop=True)
df_npp_geography
Out[10]:
Facility Name NPRI ID Company Name City CSD CA or CMA Economic Region Province Latitude Longitude
0 Gentilly-2 1445 Hydro-Québec Bécancour NaN NaN NaN QC 46.3958 -72.3569
1 Point Lepreau Generating Station 1710 New Brunswick Power Corporation Maces Bay Musquash Saint John Saint John--St. Stephen NB 45.0690 -66.4556
2 Bruce Power Site 7041 Bruce Power LP Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3289 -81.5916
3 Darlington Nuclear 3163 Ontario Power Generation Inc. Bowmanville Clarington Oshawa Toronto ON 43.8681 -78.7250
4 Pickering Nuclear - A & B 3161 Ontario Power Generation Inc. Pickering Pickering Toronto Toronto ON 43.8104 -79.0676
5 Pickering Nuclear - A & B 3163 Ontario Power Generation Inc. Pickering Pickering Toronto Toronto ON 43.8104 -79.0676
In [11]:
df_npp_geography.at[5, 'NPRI ID'] = 3161
df_npp_geography.drop_duplicates(inplace=True)
df_npp_geography = df_npp_geography.reset_index(drop=True)
df_npp_geography
Out[11]:
Facility Name NPRI ID Company Name City CSD CA or CMA Economic Region Province Latitude Longitude
0 Gentilly-2 1445 Hydro-Québec Bécancour NaN NaN NaN QC 46.3958 -72.3569
1 Point Lepreau Generating Station 1710 New Brunswick Power Corporation Maces Bay Musquash Saint John Saint John--St. Stephen NB 45.0690 -66.4556
2 Bruce Power Site 7041 Bruce Power LP Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3289 -81.5916
3 Darlington Nuclear 3163 Ontario Power Generation Inc. Bowmanville Clarington Oshawa Toronto ON 43.8681 -78.7250
4 Pickering Nuclear - A & B 3161 Ontario Power Generation Inc. Pickering Pickering Toronto Toronto ON 43.8104 -79.0676

Cleaning of the Dataframe¶

In [12]:
# Cleaning the LD (Level of detection) & NRM (Not required to monitor) values so I can convert the columns into numeric for later plotting:

df_npp['Stack Emissions'].replace('LD', 0, inplace=True)
df_npp['Stack Emissions'].replace('NRM | NRS', 0, inplace=True)
df_npp['Direct Discharge'].replace('LD', 0, inplace=True)
df_npp['Direct Discharge'].replace('NRM | NRS', 0, inplace=True)
In [13]:
# At line 28 there is a problem with a figure. It says 1,2e12 instead of 1.2e12. I will correct it:

df_npp.at[28, 'Stack Emissions'] = 1.2e12
In [14]:
# Converted columns to numeric for plotting:

df_npp['Stack Emissions'] = pd.to_numeric(df_npp['Stack Emissions'])
df_npp['Direct Discharge'] = pd.to_numeric(df_npp['Direct Discharge'])
In [15]:
# Combining Pickering Nuclear A & Pickering Nuclear B so I can plot every year (they are reported combined after 208). Also correcting the name of one data point of Bruce Power:

df_npp['Facility Name'].replace('Bruce Power Site ','Bruce Power Site',inplace=True)
df_npp['Facility Name'].replace('Pickering Nuclear - A', 'Pickering Nuclear - A & B', inplace=True)
df_npp['Facility Name'].replace('Pickering Nuclear - B', 'Pickering Nuclear - A & B', inplace=True)
df_npp['Facility Name'].unique()
Out[15]:
array(['Gentilly-2', 'Point Lepreau Generating Station',
       'Bruce Power - A', 'Bruce Power - B', 'Bruce Power Site',
       'Darlington Nuclear', 'Pickering Nuclear - A & B'], dtype=object)
In [16]:
df_npp.drop(columns=['NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude', 'Substance Name (French) | Nom de substance (Français)', '<', '<.1', 'Footnotes'], inplace=True)
df_npp.head()
Out[16]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2021 Gentilly-2 Tritium (HTO) Bq 4.430000e+13 1.560000e+14
1 2021 Gentilly-2 Carbon-14 Bq 6.170000e+09 6.070000e+07
2 2021 Gentilly-2 Total noble gases Bq-MeV 0.000000e+00 0.000000e+00
3 2021 Gentilly-2 Iodine-131 Bq 0.000000e+00 0.000000e+00
4 2021 Gentilly-2 Particulate (gross beta/gamma) Bq 5.110000e+05 7.110000e+07
In [17]:
# Aggregating Pickering A/B to be able to have a same value accross the years (because in recent years they only report the 'combined' site):

df_npp = df_npp.groupby(['Year', 'Facility Name', 'Substance Name (English)', 'Units'],as_index=False).agg({'Stack Emissions': 'sum', 'Direct Discharge': 'sum'})
df_npp.head()
Out[17]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2011 Bruce Power - A Carbon-14 Bq 1.360000e+12 1.700000e+09
1 2011 Bruce Power - A Iodine-131 Bq 3.580000e+07 0.000000e+00
2 2011 Bruce Power - A Particulate (gross beta/gamma) Bq 7.060000e+06 6.290000e+08
3 2011 Bruce Power - A Particulate gross alpha Bq 5.990000e+05 1.010000e+06
4 2011 Bruce Power - A Total noble gases Bq-MeV 6.680000e+13 0.000000e+00
In [18]:
# I'm saving the clean dataframe to do a dashboard in Tableau.

df_npp.to_csv(".\Datasets\df_npp.csv", index=True, header=True)

Plotting by Substance¶

In [19]:
df_npp['Substance Name (English)'].unique()
Out[19]:
array(['Carbon-14', 'Iodine-131', 'Particulate (gross beta/gamma)',
       'Particulate gross alpha', 'Total noble gases', 'Tritium (HTO)',
       'Estimated public dose (see footnote)', 'Elemental Tritium (HT)'],
      dtype=object)

Estimated public dose¶

In [20]:
df_npp_epd = df_npp[df_npp['Substance Name (English)'] == 'Estimated public dose (see footnote)']
df_npp_epd.head()
Out[20]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
12 2011 Bruce Power Site Estimated public dose (see footnote) mSv/a 0.0011 0.0
15 2011 Darlington Nuclear Estimated public dose (see footnote) mSv/a 0.0006 0.0
22 2011 Gentilly-2 Estimated public dose (see footnote) mSv/a 0.0015 0.0
28 2011 Pickering Nuclear - A & B Estimated public dose (see footnote) mSv/a 0.0009 0.0
35 2011 Point Lepreau Generating Station Estimated public dose (see footnote) mSv/a 0.0003 0.0
In [21]:
# Estimated public dose is calculated incorporating all major release pathways (emissions and discharges)

plt.figure(figsize=(16,6))

year = df_npp_epd['Year'].unique()

for facility in df_npp_epd['Facility Name'].unique():
    plt.plot(year, df_npp_epd[df_npp_epd['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Estimated public dose [mSv/a]', size=12)
plt.legend(df_npp_epd['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(6) Why does Gentilly have a spike in 2012/2013 & another in 2018, years after decommissioning started (2012)?

Tritium (HTO)¶

In [22]:
df_npp_tritium = df_npp[df_npp['Substance Name (English)'] == 'Tritium (HTO)']
df_npp_tritium.head()
Out[22]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
5 2011 Bruce Power - A Tritium (HTO) Bq 6.000000e+14 2.950000e+14
11 2011 Bruce Power - B Tritium (HTO) Bq 7.170000e+14 5.100000e+14
20 2011 Darlington Nuclear Tritium (HTO) Bq 1.400000e+14 1.100000e+14
26 2011 Gentilly-2 Tritium (HTO) Bq 1.900000e+14 2.440000e+14
33 2011 Pickering Nuclear - A & B Tritium (HTO) Bq 5.500000e+14 3.200000e+14
In [23]:
plt.figure(figsize=(16,6))

year = df_npp_tritium['Year'].unique()

for facility in df_npp_tritium['Facility Name'].unique():
    plt.plot(year, df_npp_tritium[df_npp_tritium['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_tritium['Facility Name'].unique(), loc='lower right')
plt.grid()

plt.show()

(7) Why does Bruce Power - A has peaks in 2014, 2017, & 2021?

(8) Why does Bruce Power - B has a peak in 2017?

In [24]:
plt.figure(figsize=(16,6))

year = df_npp_tritium['Year'].unique()

for facility in df_npp_tritium['Facility Name'].unique():
    plt.plot(year, df_npp_tritium[df_npp_tritium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_tritium['Facility Name'].unique(), loc='upper center')
plt.grid()

plt.show()

(9) Why does Bruce Power - B has a peak in 2012 & a general increasing trend?

(10) Why does Point Lepreau has a peak in 2012?

(11) Why does Darlington has a peak in 2017?

Carbon-14¶

In [25]:
df_npp_carbon = df_npp[df_npp['Substance Name (English)'] == 'Carbon-14']
df_npp_carbon.head()
Out[25]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2011 Bruce Power - A Carbon-14 Bq 1.360000e+12 1.700000e+09
6 2011 Bruce Power - B Carbon-14 Bq 1.440000e+12 2.820000e+09
13 2011 Darlington Nuclear Carbon-14 Bq 1.000000e+12 1.900000e+09
21 2011 Gentilly-2 Carbon-14 Bq 2.710000e+11 1.880000e+10
27 2011 Pickering Nuclear - A & B Carbon-14 Bq 1.770000e+12 2.200000e+09
In [26]:
plt.figure(figsize=(16,6))

year = df_npp_carbon['Year'].unique()

for facility in df_npp_carbon['Facility Name'].unique():
    plt.plot(year, df_npp_carbon[df_npp_carbon['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Carbon-14 - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_carbon['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(12) Why does Pickering has a spike leading up to 2018?

(13) Why does Bruce Power - A has peaks in 2013 & 2015?

In [27]:
plt.figure(figsize=(16,6))

year = df_npp_carbon['Year'].unique()

for facility in df_npp_carbon['Facility Name'].unique():
    plt.plot(year, df_npp_carbon[df_npp_carbon['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Carbon-14 - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_carbon['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(14) Why does Genitilly-2 has an increased amount reported between 2014 and 2017, with peaks in 2015 & 2017?

In [28]:
# I will plot without it, to see the rest without it:

df_npp_carbon2 = df_npp_carbon[df_npp_carbon['Facility Name'] != 'Gentilly-2']

plt.figure(figsize=(16,6))

year = df_npp_carbon2['Year'].unique()

for facility in df_npp_carbon2['Facility Name'].unique():
    plt.plot(year, df_npp_carbon2[df_npp_carbon2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Carbon-14 - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_carbon2['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(15) Why does Point Lepreau has peaks in 2012, 2015, and 2019?

(16) Why does Bruce Power - B has a spike leading up to 2015?

(17) Why does Darlington has peaks in 2012 & 2015?

Iodine-131¶

In [29]:
df_npp_iodine = df_npp[df_npp['Substance Name (English)'] == 'Iodine-131']
df_npp_iodine.head()
Out[29]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
1 2011 Bruce Power - A Iodine-131 Bq 35800000.0 0.0
7 2011 Bruce Power - B Iodine-131 Bq 41900000.0 0.0
16 2011 Darlington Nuclear Iodine-131 Bq 150000000.0 0.0
23 2011 Gentilly-2 Iodine-131 Bq 0.0 0.0
29 2011 Pickering Nuclear - A & B Iodine-131 Bq 23800000.0 0.0
  • Lots of zero values (they come from the 'LD' & 'NRM' that I changed to zero to be able to plot)
In [30]:
plt.figure(figsize=(16,6))

year = df_npp_iodine['Year'].unique()

for facility in df_npp_iodine['Facility Name'].unique():
    plt.plot(year, df_npp_iodine[df_npp_iodine['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-131 - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_iodine['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(18) Why does Bruce Power - A has peaks in 2012 & 2014?

(19) Why does Point Leprau increased so much in 2021?

In [31]:
plt.figure(figsize=(16,6))

year = df_npp_iodine['Year'].unique()

for facility in df_npp_iodine['Facility Name'].unique():
    plt.plot(year, df_npp_iodine[df_npp_iodine['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-131 - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_iodine['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(20) Why is Point Leprau the only one to report values & only in 2020 & 2021?

  • No other plant reports values.

Particulate (gross beta/gamma)¶

In [32]:
df_npp_beta = df_npp[df_npp['Substance Name (English)'] == 'Particulate (gross beta/gamma)']
df_npp_beta.head()
Out[32]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
2 2011 Bruce Power - A Particulate (gross beta/gamma) Bq 7060000.0 6.290000e+08
8 2011 Bruce Power - B Particulate (gross beta/gamma) Bq 50700000.0 2.380000e+09
17 2011 Darlington Nuclear Particulate (gross beta/gamma) Bq 40000000.0 3.100000e+10
24 2011 Gentilly-2 Particulate (gross beta/gamma) Bq 913000.0 5.340000e+09
30 2011 Pickering Nuclear - A & B Particulate (gross beta/gamma) Bq 11800000.0 1.910000e+10
In [33]:
plt.figure(figsize=(16,6))

year = df_npp_beta['Year'].unique()

for facility in df_npp_beta['Facility Name'].unique():
    plt.plot(year, df_npp_beta[df_npp_beta['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate (gross beta/gamma) - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_beta['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(21) Why does Pickering has a peak in 2017?

In [34]:
plt.figure(figsize=(16,6))

year = df_npp_beta['Year'].unique()

for facility in df_npp_beta['Facility Name'].unique():
    plt.plot(year, df_npp_beta[df_npp_beta['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate (gross beta/gamma) - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_beta['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(22) Why does Pickering has a spike in 2020?

In [35]:
# I'll plot without Pickering to have a better look at the others:

df_npp_beta2 = df_npp_beta[df_npp_beta['Facility Name'] != 'Pickering Nuclear - A & B']

plt.figure(figsize=(16,6))

year = df_npp_beta2['Year'].unique()

for facility in df_npp_beta2['Facility Name'].unique():
    plt.plot(year, df_npp_beta2[df_npp_beta2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate (gross beta/gamma) - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_beta2['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(23) Why does Darlington has an increased amount released in 2015 & 2016?

In [36]:
# I'll plot without Darlington now:

df_npp_beta3 = df_npp_beta2[df_npp_beta2['Facility Name'] != 'Darlington Nuclear']

plt.figure(figsize=(16,6))

year = df_npp_beta3['Year'].unique()

for facility in df_npp_beta3['Facility Name'].unique():
    plt.plot(year, df_npp_beta3[df_npp_beta3['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate (gross beta/gamma) - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_beta3['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

Particulate gross alpha¶

In [37]:
df_npp_alpha = df_npp[df_npp['Substance Name (English)'] == 'Particulate gross alpha']
df_npp_alpha.head()
Out[37]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
3 2011 Bruce Power - A Particulate gross alpha Bq 599000.0 1010000.0
9 2011 Bruce Power - B Particulate gross alpha Bq 17800000.0 1480000.0
18 2011 Darlington Nuclear Particulate gross alpha Bq 0.0 1100000.0
31 2011 Pickering Nuclear - A & B Particulate gross alpha Bq 0.0 48000000.0
38 2011 Point Lepreau Generating Station Particulate gross alpha Bq 0.0 5800000.0
In [38]:
plt.figure(figsize=(16,6))

year = df_npp_alpha['Year'].unique()

for facility in df_npp_alpha['Facility Name'].unique():
    plt.plot(year, df_npp_alpha[df_npp_alpha['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_alpha['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(24) Why does Bruce Power - B has a peak in 2011?

(25) Why does Darlington has a spike between 2013 & 2016?

In [39]:
plt.figure(figsize=(16,6))

year = df_npp_alpha['Year'].unique()

for facility in df_npp_alpha['Facility Name'].unique():
    plt.plot(year, df_npp_alpha[df_npp_alpha['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Direct Discharge [Bq]', size=12)
plt.legend(df_npp_alpha['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(26) Why does Point Lepreau has a peak in 2014?

(27) Why does Pickering has a peak in 2011?

(28) Gentilly doesn't report anything for Particulate Gross Alpha (Emissions & Discharge).

Elemental Tritium (HT)¶

In [40]:
df_npp_ht = df_npp[df_npp['Substance Name (English)'] == 'Elemental Tritium (HT)']
df_npp_ht.head()
Out[40]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
14 2011 Darlington Nuclear Elemental Tritium (HT) Bq 8.800000e+13 0.0
55 2012 Darlington Nuclear Elemental Tritium (HT) Bq 2.600000e+13 0.0
96 2013 Darlington Nuclear Elemental Tritium (HT) Bq 1.800000e+13 0.0
137 2014 Darlington Nuclear Elemental Tritium (HT) Bq 5.200000e+13 0.0
178 2015 Darlington Nuclear Elemental Tritium (HT) Bq 1.700000e+13 0.0
In [41]:
plt.figure(figsize=(16,6))

year = df_npp_ht['Year'].unique()

for facility in df_npp_ht['Facility Name'].unique():
    plt.plot(year, df_npp_ht[df_npp_ht['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Elemental Tritium (HT) - Stack Emissions [Bq]', size=12)
plt.legend(df_npp_ht['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(29) Why does Darlington has peaks in 2011, 2014, & 2017?

(30) Why is Darlington is the only one that reports this (& only Stack Emissions)?

Total noble gases¶

In [42]:
df_npp_noble = df_npp[df_npp['Substance Name (English)'] == 'Total noble gases']
df_npp_noble.head()
Out[42]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
4 2011 Bruce Power - A Total noble gases Bq-MeV 6.680000e+13 0.0
10 2011 Bruce Power - B Total noble gases Bq-MeV 3.640000e+12 0.0
19 2011 Darlington Nuclear Total noble gases Bq-MeV 2.200000e+13 0.0
25 2011 Gentilly-2 Total noble gases Bq-MeV 1.160000e+11 0.0
32 2011 Pickering Nuclear - A & B Total noble gases Bq-MeV 1.830000e+14 0.0
In [43]:
plt.figure(figsize=(16,6))

year = df_npp_noble['Year'].unique()

for facility in df_npp_noble['Facility Name'].unique():
    plt.plot(year, df_npp_noble[df_npp_noble['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Total noble gases - Stack Emissions [Bq-MeV]', size=12)
plt.legend(df_npp_noble['Facility Name'].unique(), loc='upper center')
plt.grid()

plt.show()

(31) Why does Pickering Nuclear produces so much more than the rest? And has peaks in 2011, 2017, & 2021?

(32) Why does Point Lepreau has a peak in 2016?

In [44]:
plt.figure(figsize=(16,6))

year = df_npp_noble['Year'].unique()

for facility in df_npp_noble['Facility Name'].unique():
    plt.plot(year, df_npp_noble[df_npp_noble['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Total noble gases - Direct Discharge [Bq-MeV]', size=12)
plt.legend(df_npp_noble['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()
  • No values for Direct Discharge. Makes sense. Noble gases wouldn't dissolve in water.

Individual Plotting by Substance & Facility¶

In [45]:
facilities = df_npp['Facility Name'].unique()

for f in facilities:
    df = df_npp[df_npp['Facility Name'] == f]
    print(f,'\n')
    subs = df['Substance Name (English)'].unique()
    for s in subs:
        df2 = df[df['Substance Name (English)'] == s]
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,8))
        fig.subplots_adjust(hspace=0.5)
    
        ax1.plot(df2['Year'], df2['Stack Emissions'], color='green')
        ax1.set_title(s + ' - Stack Emissions', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax1.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax1.grid()
    
        ax2.plot(df2['Year'], df2['Direct Discharge'], color='red')
        ax2.set_title(s + ' - Direct Discharge', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax2.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax2.grid()
    
        plt.show()
Bruce Power - A 

Bruce Power - B 

Bruce Power Site 

Darlington Nuclear 

Gentilly-2 

Pickering Nuclear - A & B 

Point Lepreau Generating Station 

Comparison with 2020 Dataframe¶

In [46]:
df_npp_2020 = pd.read_csv("./Datasets/2020/Nuclear Power Plants.csv")
df_npp_2020.head()
Out[46]:
_id Year | Annee NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Region economique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Francais) Units | Unites Stack Emissions | Emissions de cheminees Direct Discharge | Evacuations directes Footnotes | Notes de bas de page
0 1 2020 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Tritium (HTO) Tritium (Eau tritiée) Bq 8.11E+13 1.97E+13 NaN
1 2 2020 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Carbon-14 Carbone-14 Bq 8.18E+09 4.92E+07 NaN
2 3 2020 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Total noble gases Total des gaz nobles Bq-MeV LD NRM | NRS <LD = 0
3 4 2020 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Iodine-131 Iode-131 Bq LD NRM | NRS <LD = 0
4 5 2020 1445 Hydro-Québec Gentilly-2 Bécancour NaN NaN NaN QC 46.3958 -72.3569 Particulate (gross beta/gamma) Particules (bêta brutes/gamma brutes) Bq 4.47E+05 1.65E+08 NaN
In [47]:
# It doesn't have the same columns, so I will keep only the essentials:

df_npp_2020 = df_npp_2020[['Year | Annee', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Emissions de cheminees',
       'Direct Discharge | Evacuations directes']]
df_npp_2020.rename(columns={'Year | Annee': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Emissions de cheminees':'Stack Emissions','Direct Discharge | Evacuations directes':'Direct Discharge'}, inplace = True)
df_npp_2020.head()
Out[47]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2020 Gentilly-2 Tritium (HTO) 8.11E+13 1.97E+13
1 2020 Gentilly-2 Carbon-14 8.18E+09 4.92E+07
2 2020 Gentilly-2 Total noble gases LD NRM | NRS
3 2020 Gentilly-2 Iodine-131 LD NRM | NRS
4 2020 Gentilly-2 Particulate (gross beta/gamma) 4.47E+05 1.65E+08
In [48]:
# I will go back to the original copy of the dataframe & keep only the essentials to compare:

df_npp_2021 = df_npp_0[['Year | Année', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Émissions de cheminées',
       'Direct Discharge | Évacuations directes']]
df_npp_2021.rename(columns={'Year | Année': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge'}, inplace = True)
df_npp_2021.head()
Out[48]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2021 Gentilly-2 Tritium (HTO) 4.43E+13 1.56E+14
1 2021 Gentilly-2 Carbon-14 6.17E+09 6.07E+07
2 2021 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
3 2021 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
4 2021 Gentilly-2 Particulate (gross beta/gamma) 5.11E+05 7.11E+07
In [49]:
# Now that they have the same columns, I will remove 2021 from the new dataframe and compare the remaining with 2020.

df_npp_2021 = df_npp_2021[df_npp_2021['Year'] != 2021].reset_index(drop = True)
df_npp_2021.head()
Out[49]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2020 Gentilly-2 Tritium (HTO) 8.11E+13 1.97E+13
1 2020 Gentilly-2 Carbon-14 8.19E+09 4.92E+07
2 2020 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
3 2020 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
4 2020 Gentilly-2 Particulate (gross beta/gamma) 4.47E+05 1.65E+08
In [50]:
# I will concatenate both dataframes & keep the not duplicates to see the differences. This produces a df with 2021's values first, and 2020's values after:

df = pd.concat([df_npp_2021,df_npp_2020]).drop_duplicates(keep=False)
df

# 61 changes total. Some changes are LD to NRM . And others are numerical changes.
Out[50]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
1 2020 Gentilly-2 Carbon-14 8.19E+09 4.92E+07
2 2020 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
3 2020 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
13 2020 Bruce Power - A Tritium (HTO) 3.40E+14 2.50E+14
14 2020 Bruce Power - A Carbon-14 1.60E+12 1.10E+09
... ... ... ... ... ...
412 2011 Gentilly-2 Carbon-14 2.71E+11 1.89E+10
415 2011 Gentilly-2 Particulate (gross beta/gamma) 9.13E+05 5.35E+09
417 2011 Point Lepreau Generating Station Tritium (HTO) 4.30E+11 3.40E+13
418 2011 Point Lepreau Generating Station Carbon-14 3.30E+15 1.40E+07
429 2011 Bruce Power - A Particulate gross alpha 5.99E+05 1.09E+06

122 rows × 5 columns

In [51]:
# Looking for LD (2020) to NRM (2021) changes first.

df[(df['Stack Emissions'].isin(['NRM | NRS', 'LD'])) & (df['Direct Discharge'] == 'NRM | NRS')] 

# I checked with 'Direct Discharge = NRM/LD', but there wasn't any change there. 
# The second condition is to filter out 2 results that are numerical changes in values of 'Direct Discharge', but same 'Stack Emissions'.
# Finally, it's important to see the Index number 63. I will address that below.
Out[51]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
2 2020 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
3 2020 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
43 2019 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
44 2019 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
84 2018 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
85 2018 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
131 2017 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
132 2017 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
178 2016 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
179 2016 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
225 2015 Gentilly-2 Total noble gases NRM | NRS NRM | NRS
226 2015 Gentilly-2 Iodine-131 NRM | NRS NRM | NRS
2 2020 Gentilly-2 Total noble gases LD NRM | NRS
3 2020 Gentilly-2 Iodine-131 LD NRM | NRS
43 2019 Gentilly-2 Total noble gases LD NRM | NRS
44 2019 Gentilly-2 Iodine-131 LD NRM | NRS
63 2019 Bruce Power - B Iodine-131 LD NRM | NRS
84 2018 Gentilly-2 Total noble gases LD NRM | NRS
85 2018 Gentilly-2 Iodine-131 LD NRM | NRS
131 2017 Gentilly-2 Total noble gases LD NRM | NRS
132 2017 Gentilly-2 Iodine-131 LD NRM | NRS
178 2016 Gentilly-2 Total noble gases LD NRM | NRS
179 2016 Gentilly-2 Iodine-131 LD NRM | NRS
225 2015 Gentilly-2 Total noble gases LD NRM | NRS
226 2015 Gentilly-2 Iodine-131 LD NRM | NRS

(33) In every case, values for 'Stack Emissions' that were 'LD' in the 2020's database, became 'NRM | NRS' in the 2021's. It's only in Gentilly-2 for Iodine-131 & Noble Gases from 2015 to 2020.

In [52]:
# What's going on with index 63 that it didn't appear twice in the table above (for 2021 & then for 2020):

df.loc[[63]]
Out[52]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
63 2019 Bruce Power - B Iodine-131 4.40E+05 NRM | NRS
63 2019 Bruce Power - B Iodine-131 LD NRM | NRS
In [53]:
# Actual changes in: Index Number: 364, 417, 418, 371, 280, 281, 229, 51, 287, 429, 330, 60, 338. (I filtered out the 'LD' & 'NRM' to get this list, 13 total)

df.loc[[364, 417, 418, 371, 280, 281, 229, 51, 287, 429, 330, 60, 338]] # First value: 2021, Second Value: 2020.
Out[53]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
364 2012 Gentilly-2 Tritium (HTO) 2.09E+14 3.52E+14
364 2012 Gentilly-2 Tritium (HTO) 2.13E+14 3.51E+14
417 2011 Point Lepreau Generating Station Tritium (HTO) 4.50E+13 3.40E+13
417 2011 Point Lepreau Generating Station Tritium (HTO) 4.30E+11 3.40E+13
418 2011 Point Lepreau Generating Station Carbon-14 2.80E+10 3.80E+07
418 2011 Point Lepreau Generating Station Carbon-14 3.30E+15 1.40E+07
371 2012 Point Lepreau Generating Station Carbon-14 3.70E+10 1.40E+10
371 2012 Point Lepreau Generating Station Carbon-14 3.70E+10 3.80E+10
280 2014 Point Lepreau Generating Station Particulate (gross beta/gamma) NRM | NRS 1.00E+07
280 2014 Point Lepreau Generating Station Particulate (gross beta/gamma) NRM | NRS 1.50E+08
281 2014 Point Lepreau Generating Station Particulate gross alpha NRM | NRS 8.30E+07
281 2014 Point Lepreau Generating Station Particulate gross alpha NRM | NRS 8.60E+06
229 2015 Point Lepreau Generating Station Tritium (HTO) 1.40E+14 1.40E+14
229 2015 Point Lepreau Generating Station Tritium (HTO) 1.40E+13 1.40E+14
51 2019 Point Lepreau Generating Station Particulate (gross beta/gamma) 2.20E+06 8.40E+07
51 2019 Point Lepreau Generating Station Particulate (gross beta/gamma) 1.14E+08 8.40E+07
287 2014 Bruce Power - A Particulate (gross beta/gamma) 3.13E+06 9.57E+08
287 2014 Bruce Power - A Particulate (gross beta/gamma) 3.13E+06 1.02E+09
429 2011 Bruce Power - A Particulate gross alpha 5.99E+05 1.01E+06
429 2011 Bruce Power - A Particulate gross alpha 5.99E+05 1.09E+06
330 2013 Bruce Power - A Tritium (HTO) 5.09E+14 1.96E+14
330 2013 Bruce Power - A Tritium (HTO) 5.04E+14 1.96E+14
60 2019 Bruce Power - B Tritium (HTO) 3.29E+14 8.82E+14
60 2019 Bruce Power - B Tritium (HTO) 3.29E+14 8.84E+14
338 2013 Bruce Power - B Total noble gases 3.71E+12 NRM | NRS
338 2013 Bruce Power - B Total noble gases 5.25E+13 NRM | NRS

(35) Why did this 13 set of values changed between reports? Why wasn't it addressed somewhere?


Nuclear Processing Facilities:¶

First Look at the Dataframe¶

In [54]:
df_npf = pd.read_csv("./Datasets/Nuclear Processing Facilities.csv")
df_npf
Out[54]:
Year | Année NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Région économique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Français) Units | Unités Stack Emissions | Émissions de cheminées Direct Discharge | Évacuations directes Footnotes | Notes de bas de page
0 2021 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Uranium Uranium kg 3.2 2.2 NaN
1 2021 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Radium-226 Radium-226 MBq NRM | NRS 2.2 NaN
2 2021 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.009 NRM | NRS Estimated public dose is calculated incorporat...
3 2021 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Uranium Uranium kg 39 NRM | NRS NaN
4 2021 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.072 NRM | NRS Site 1, Estimated public dose is calculated in...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
189 2013 2247.0 Nordion (Canada) Inc. Nordion - Ottawa Ottawa NaN NaN NaN ON 45.3408 -75.9179 Iodine-131 Iode-131 GBq 3.90E-01 NRM | NRS NaN
190 2013 2247.0 Nordion (Canada) Inc. Nordion - Ottawa Ottawa NaN NaN NaN ON 45.3408 -75.9179 Xenon-133 Xénon-133 GBq 3.07E+04 NRM | NRS NaN
191 2013 2247.0 Nordion (Canada) Inc. Nordion - Ottawa Ottawa NaN NaN NaN ON 45.3408 -75.9179 Xenon-135 Xénon-135 GBq 2.82E+04 NRM | NRS NaN
192 2013 2247.0 Nordion (Canada) Inc. Nordion - Ottawa Ottawa NaN NaN NaN ON 45.3408 -75.9179 Xenon-135m Xénon-135m GBq 4.34E+04 NRM | NRS NaN
193 2013 2247.0 Nordion (Canada) Inc. Nordion - Ottawa Ottawa NaN NaN NaN ON 45.3408 -75.9179 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.0195 NRM | NRS Estimated public dose is calculated incorporat...

194 rows × 17 columns

(1) Why does the Data start in 2013? Can we get older data?

In [55]:
# I'm creating a copy because I will need it later.

df_npf_0 = df_npf.copy()
In [56]:
df_npf["Facility Name | Nom de l'installation"].unique()
Out[56]:
array(['Blind River Refinery', 'Port Hope Conversion Facility',
       'Cameco Fuel Manufacturing', 'BWXT - Toronto',
       'BWXT - Peterborough', 'SRBT', 'Nordion - Ottawa'], dtype=object)
In [57]:
# Renaming columns to English only:

df_npf.rename(columns={'Year | Année': 'Year', 'NPRI ID | ID INRP': 'NPRI ID','Company Name | Raison Sociale':'Company Name',"Facility Name | Nom de l'installation":'Facility Name', 'City | Ville':'City', 'CSD | SDR':'CSD','CA or CMA | AR ou RMR':'CA or CMA', 'Economic Region | Région économique':'Economic Region','Province | Province':'Province', 'Latitude | Latitude':'Latitude', 'Longitude | Longitude':'Longitude','Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)','Units | Unités':'Units', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge','Footnotes | Notes de bas de page':'Footnotes'}, inplace = True)
df_npf.head()
Out[57]:
Year NPRI ID Company Name Facility Name City CSD CA or CMA Economic Region Province Latitude Longitude Substance Name (English) Substance Name (French) | Nom de substance (Français) Units Stack Emissions Direct Discharge Footnotes
0 2021 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Uranium Uranium kg 3.2 2.2 NaN
1 2021 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Radium-226 Radium-226 MBq NRM | NRS 2.2 NaN
2 2021 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.009 NRM | NRS Estimated public dose is calculated incorporat...
3 2021 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Uranium Uranium kg 39 NRM | NRS NaN
4 2021 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.072 NRM | NRS Site 1, Estimated public dose is calculated in...

I noticed some values are expressed as "NRM (Not Required to Monitor)". I will summarize which values are given like that, before replacing them with zeros to be able to plot.

In [58]:
# Stack Emission column first:

df_npf_miss_stack = df_npf[df_npf['Stack Emissions'] == 'NRM | NRS']
df_npf_miss_stack[['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
Out[58]:
Year Facility Name Substance Name (English) Stack Emissions
174 2013 Blind River Refinery Radium-226 NRM | NRS
153 2014 Blind River Refinery Radium-226 NRM | NRS
132 2015 Blind River Refinery Radium-226 NRM | NRS
111 2016 Blind River Refinery Radium-226 NRM | NRS
89 2017 Blind River Refinery Radium-226 NRM | NRS
67 2018 Blind River Refinery Radium-226 NRM | NRS
45 2019 Blind River Refinery Radium-226 NRM | NRS
23 2020 Blind River Refinery Radium-226 NRM | NRS
1 2021 Blind River Refinery Radium-226 NRM | NRS

(2) Summary of Missing Data (NRM) for Stack Emissions:

  • Blind River Refinery: Radium-226 from 2013 to 2021 (only facility to report Radium-226).
In [59]:
# Direct Discharge column next:

df_npf_miss_discharge = df_npf[df_npf['Direct Discharge'] == 'NRM | NRS']
df_npf_miss_discharge[['Year','Facility Name', 'Substance Name (English)', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
Out[59]:
Year Facility Name Substance Name (English) Direct Discharge
183 2013 BWXT - Peterborough Estimated public dose (see footnote) NRM | NRS
162 2014 BWXT - Peterborough Estimated public dose (see footnote) NRM | NRS
141 2015 BWXT - Peterborough Estimated public dose (see footnote) NRM | NRS
120 2016 BWXT - Peterborough Estimated public dose (see footnote) NRM | NRS
99 2017 BWXT - Peterborough Estimated public dose (see footnote) NRM | NRS
... ... ... ... ...
100 2017 SRBT Tritium (HTO) NRM | NRS
78 2018 SRBT Tritium (HTO) NRM | NRS
56 2019 SRBT Tritium (HTO) NRM | NRS
34 2020 SRBT Tritium (HTO) NRM | NRS
12 2021 SRBT Tritium (HTO) NRM | NRS

176 rows × 4 columns

(3) Summary of Missing Data (NRM) for Direct Discharge:

  • BWXT - Peterborough: Uranium from 2013 to 2021.
  • BWXT - Toronto: Uranium from 2013 to 2021.
  • Cameco Fuel Manufacturing: Uranium from 2013 to 2021.
  • Nordion - Ottawa: All substances from 2013 to 2021 (Cobalt-60, Iodine-125, Iodine-131, Xenon-133, Xenon-135, & Xenon-135m).
  • Port Hope Conversion Facility: Uranium from 2013 to 2021.
  • SRBT: Elemental Tritium (HT) from 2013 to 2021; Tritium (HTO) from 2013 to 2021 (only facility to report these substances).

Note 1: Estimated Public Dose is missing for all of the Direct Discharge, as it was reported in Stack Emissions "incorporating all major release pathways (emissions and discharges)" according to the footnote.

Note 2: Blind River Refinery is the only facility to report Uranium Direct Discharge values; & SRBT & Nordion - Ottawa don't report Uranium at all.

Note 3: Blind River Refinery is the only facility to report Direct Discharge values.

In [60]:
# I noticed some values are "0" or "0.00E+00".

df_npf[df_npf['Stack Emissions'].isin(['0', '0.00E+00'])][['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
Out[60]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
183 2013 BWXT - Peterborough Estimated public dose (see footnote) 0 NRM | NRS
162 2014 BWXT - Peterborough Estimated public dose (see footnote) 0 NRM | NRS
141 2015 BWXT - Peterborough Estimated public dose (see footnote) 0 NRM | NRS
120 2016 BWXT - Peterborough Estimated public dose (see footnote) 0 NRM | NRS
99 2017 BWXT - Peterborough Estimated public dose (see footnote) 0 NRM | NRS
77 2018 BWXT - Peterborough Estimated public dose (see footnote) 0 NRM | NRS
33 2020 BWXT - Peterborough Estimated public dose (see footnote) 0 NRM | NRS
11 2021 BWXT - Peterborough Estimated public dose (see footnote) 0 NRM | NRS
10 2021 BWXT - Peterborough Uranium 0 NRM | NRS
37 2020 Nordion - Ottawa Cobalt-60 0.00E+00 NRM | NRS
82 2018 Nordion - Ottawa Iodine-125 0.00E+00 NRM | NRS
60 2019 Nordion - Ottawa Iodine-125 0.00E+00 NRM | NRS
38 2020 Nordion - Ottawa Iodine-125 0.00E+00 NRM | NRS
16 2021 Nordion - Ottawa Iodine-125 0.00E+00 NRM | NRS
61 2019 Nordion - Ottawa Iodine-131 0.00E+00 NRM | NRS
39 2020 Nordion - Ottawa Iodine-131 0.00E+00 NRM | NRS
17 2021 Nordion - Ottawa Iodine-131 0.00E+00 NRM | NRS
106 2017 Nordion - Ottawa Xenon-133 0.00E+00 NRM | NRS
84 2018 Nordion - Ottawa Xenon-133 0.00E+00 NRM | NRS
62 2019 Nordion - Ottawa Xenon-133 0.00E+00 NRM | NRS
40 2020 Nordion - Ottawa Xenon-133 0.00E+00 NRM | NRS
18 2021 Nordion - Ottawa Xenon-133 0.00E+00 NRM | NRS
107 2017 Nordion - Ottawa Xenon-135 0.00E+00 NRM | NRS
85 2018 Nordion - Ottawa Xenon-135 0.00E+00 NRM | NRS
63 2019 Nordion - Ottawa Xenon-135 0.00E+00 NRM | NRS
41 2020 Nordion - Ottawa Xenon-135 0.00E+00 NRM | NRS
19 2021 Nordion - Ottawa Xenon-135 0.00E+00 NRM | NRS
108 2017 Nordion - Ottawa Xenon-135m 0.00E+00 NRM | NRS
86 2018 Nordion - Ottawa Xenon-135m 0.00E+00 NRM | NRS
64 2019 Nordion - Ottawa Xenon-135m 0.00E+00 NRM | NRS
42 2020 Nordion - Ottawa Xenon-135m 0.00E+00 NRM | NRS
20 2021 Nordion - Ottawa Xenon-135m 0.00E+00 NRM | NRS

(4) Summary of Zero Values (all Stack Emissions):

  • BWXT - Peterborough: Estimated Public Dose from 2013 to 2018 & 2020 to 2021; Uranium 2021.
  • Nordion - Ottawa: Cobalt-60 2020; Iodine-125 from 2018 to 2021; Iodine-131 from 2019 to 2021; Xenon-133 from 2017 to 2021; Xenon-135 from 2017 to 2021; Xenon-135m from 2017 to 2021.

Creating Geographic Reference Table¶

In [61]:
df_npf_geography = df_npf[['Facility Name', 'NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude']]
df_npf_geography
Out[61]:
Facility Name NPRI ID Company Name City CSD CA or CMA Economic Region Province Latitude Longitude
0 Blind River Refinery 3657.0 Cameco Blind River NaN NaN NaN ON 46.1814 -83.0177
1 Blind River Refinery 3657.0 Cameco Blind River NaN NaN NaN ON 46.1814 -83.0177
2 Blind River Refinery 3657.0 Cameco Blind River NaN NaN NaN ON 46.1814 -83.0177
3 Port Hope Conversion Facility 1145.0 Cameco Port Hope NaN NaN NaN ON 43.9437 -78.2954
4 Port Hope Conversion Facility 1145.0 Cameco Port Hope NaN NaN NaN ON 43.9437 -78.2954
... ... ... ... ... ... ... ... ... ... ...
189 Nordion - Ottawa 2247.0 Nordion (Canada) Inc. Ottawa NaN NaN NaN ON 45.3408 -75.9179
190 Nordion - Ottawa 2247.0 Nordion (Canada) Inc. Ottawa NaN NaN NaN ON 45.3408 -75.9179
191 Nordion - Ottawa 2247.0 Nordion (Canada) Inc. Ottawa NaN NaN NaN ON 45.3408 -75.9179
192 Nordion - Ottawa 2247.0 Nordion (Canada) Inc. Ottawa NaN NaN NaN ON 45.3408 -75.9179
193 Nordion - Ottawa 2247.0 Nordion (Canada) Inc. Ottawa NaN NaN NaN ON 45.3408 -75.9179

194 rows × 10 columns

In [62]:
# Cleaning the geography dataframe:

df_npf_geography.drop_duplicates(inplace=True)
df_npf_geography = df_npf_geography.reset_index(drop=True)
df_npf_geography
Out[62]:
Facility Name NPRI ID Company Name City CSD CA or CMA Economic Region Province Latitude Longitude
0 Blind River Refinery 3657.0 Cameco Blind River NaN NaN NaN ON 46.1814 -83.0177
1 Port Hope Conversion Facility 1145.0 Cameco Port Hope NaN NaN NaN ON 43.9437 -78.2954
2 Cameco Fuel Manufacturing NaN Cameco Port Hope NaN NaN NaN ON 43.9540 -78.2748
3 BWXT - Toronto NaN BWXT Nuclear Energy Canada Toronto NaN NaN NaN ON 43.6680 -79.4468
4 BWXT - Peterborough NaN BWXT Nuclear Energy Canada Peterborough NaN NaN NaN ON 44.2961 -78.3337
5 SRBT NaN SRB Technologies (Canada) Inc. Pembroke NaN NaN NaN ON 45.8054 -77.1180
6 Nordion - Ottawa 2247.0 Nordion (Canada) Inc. Ottawa NaN NaN NaN ON 45.3408 -75.9179

Cleaning of the Dataframe¶

In [63]:
# Cleaning the NRM (Not required to monitor) values so I can convert the columns into numeric for later plotting:

df_npf['Stack Emissions'].replace('NRM | NRS', 0, inplace=True)
df_npf['Direct Discharge'].replace('NRM | NRS', 0, inplace=True)
In [64]:
# Converted columns to numeric for plotting:

df_npf['Stack Emissions'] = pd.to_numeric(df_npf['Stack Emissions'])
df_npf['Direct Discharge'] = pd.to_numeric(df_npf['Direct Discharge'])
In [65]:
# I noticed in the footnotes of Estimated Public Dose for Port Hope Conversion Facility it details if it's "Site 1" or "Site 2":

df_npf[(df_npf['Facility Name'] == 'Port Hope Conversion Facility') & (df_npf['Substance Name (English)'] == 'Estimated public dose (see footnote)')].sort_values(by = ['Footnotes', 'Year'])
Out[65]:
Year NPRI ID Company Name Facility Name City CSD CA or CMA Economic Region Province Latitude Longitude Substance Name (English) Substance Name (French) | Nom de substance (Français) Units Stack Emissions Direct Discharge Footnotes
177 2013 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.021 0.0 Estimated public dose is calculated incorporat...
156 2014 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.012 0.0 Estimated public dose is calculated incorporat...
135 2015 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.006 0.0 Estimated public dose is calculated incorporat...
114 2016 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.020 0.0 Estimated public dose is calculated incorporat...
92 2017 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.110 0.0 Site 1, Estimated public dose is calculated in...
70 2018 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.142 0.0 Site 1, Estimated public dose is calculated in...
48 2019 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.080 0.0 Site 1, Estimated public dose is calculated in...
26 2020 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.129 0.0 Site 1, Estimated public dose is calculated in...
4 2021 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.072 0.0 Site 1, Estimated public dose is calculated in...
93 2017 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.153 0.0 Site 2, Estimated public dose is calculated in...
71 2018 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.173 0.0 Site 2, Estimated public dose is calculated in...
49 2019 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.127 0.0 Site 2, Estimated public dose is calculated in...
27 2020 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.117 0.0 Site 2, Estimated public dose is calculated in...
5 2021 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.086 0.0 Site 2, Estimated public dose is calculated in...

(5) Why are Port Hope Conversion Facility's Estimated Public Dose values combined from 2013 to 2016, but separated between Site 1 & Site 2 from 2017 to 2021?

In [66]:
df_npf.drop(columns=['NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude', 'Substance Name (French) | Nom de substance (Français)', 'Footnotes'], inplace=True)
df_npf.head()
Out[66]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2021 Blind River Refinery Uranium kg 3.200 2.2
1 2021 Blind River Refinery Radium-226 MBq 0.000 2.2
2 2021 Blind River Refinery Estimated public dose (see footnote) mSv/a 0.009 0.0
3 2021 Port Hope Conversion Facility Uranium kg 39.000 0.0
4 2021 Port Hope Conversion Facility Estimated public dose (see footnote) mSv/a 0.072 0.0
In [67]:
# Aggregating Port Hope Site 1 & 2 Estimated Public Dose to be able to have a same value accross the years to plot:

df_npf = df_npf.groupby(['Year', 'Facility Name', 'Substance Name (English)', 'Units'],as_index=False).agg({'Stack Emissions': 'sum', 'Direct Discharge': 'sum'})
df_npf[(df_npf['Facility Name'] == 'Port Hope Conversion Facility') & (df_npf['Substance Name (English)'] == 'Estimated public dose (see footnote)')]
df_npf.head()
Out[67]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2013 BWXT - Peterborough Estimated public dose (see footnote) mSv/a 0.000000 0.0
1 2013 BWXT - Peterborough Uranium kg 0.000013 0.0
2 2013 BWXT - Toronto Estimated public dose (see footnote) mSv/a 0.000600 0.0
3 2013 BWXT - Toronto Uranium kg 0.010400 0.0
4 2013 Blind River Refinery Estimated public dose (see footnote) mSv/a 0.012000 0.0
In [68]:
# I'm saving the clean dataframe to do a dashboard in Tableau.

df_npf.to_csv(".\Datasets\df_npf.csv", index=True, header=True)

Plotting by Substance¶

In [69]:
df_npf['Substance Name (English)'].unique()
Out[69]:
array(['Estimated public dose (see footnote)', 'Uranium', 'Radium-226',
       'Cobalt-60', 'Iodine-125', 'Iodine-131', 'Xenon-133', 'Xenon-135',
       'Xenon-135m', 'Elemental Tritium (HT)', 'Tritium (HTO)'],
      dtype=object)

Estimated public dose¶

In [70]:
df_npf_epd = df_npf[df_npf['Substance Name (English)'] == 'Estimated public dose (see footnote)']
df_npf_epd.head()
Out[70]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2013 BWXT - Peterborough Estimated public dose (see footnote) mSv/a 0.0000 0.0
2 2013 BWXT - Toronto Estimated public dose (see footnote) mSv/a 0.0006 0.0
4 2013 Blind River Refinery Estimated public dose (see footnote) mSv/a 0.0120 0.0
7 2013 Cameco Fuel Manufacturing Estimated public dose (see footnote) mSv/a 0.0130 0.0
10 2013 Nordion - Ottawa Estimated public dose (see footnote) mSv/a 0.0195 0.0
In [71]:
# Estimated public dose is calculated incorporating all major release pathways (emissions and discharges).

plt.figure(figsize=(16,6))

year = df_npf_epd['Year'].unique()

for facility in df_npf_epd['Facility Name'].unique():
    plt.plot(year, df_npf_epd[df_npf_epd['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Estimated public dose [mSv/a]', size=12)
plt.legend(df_npf_epd['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(6) Why does Port Hope have a spike in 2017 to 2021 with peaks in 2018 & 2020 (Site 1 & Site 2 produce comparable amounts to contribute to this peak)?

(7) Why is there a peak in Cameco Fuel Manufacturing in 2021?

Uranium¶

In [72]:
df_npf_uranium = df_npf[df_npf['Substance Name (English)'] == 'Uranium']
df_npf_uranium.head()
Out[72]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
1 2013 BWXT - Peterborough Uranium kg 0.000013 0.0
3 2013 BWXT - Toronto Uranium kg 0.010400 0.0
6 2013 Blind River Refinery Uranium kg 4.100000 3.6
8 2013 Cameco Fuel Manufacturing Uranium kg 0.510000 0.0
17 2013 Port Hope Conversion Facility Uranium kg 68.400000 0.0
In [73]:
plt.figure(figsize=(16,6))

year = df_npf_uranium['Year'].unique()

for facility in df_npf_uranium['Facility Name'].unique():
    plt.plot(year, df_npf_uranium[df_npf_uranium['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Stack Emissions [kg]', size=12)
plt.legend(df_npf_uranium['Facility Name'].unique(), loc='upper center')
plt.grid()

plt.show()

(8) Why does Port Hope produce so much more than the rest, & with peaks in 2013 & 2019?

In [74]:
# Plotting without Port Hope to take a look at the rest:

df_npf_uranium2 = df_npf_uranium[df_npf_uranium['Facility Name'] != 'Port Hope Conversion Facility']

plt.figure(figsize=(16,6))

year = df_npf_uranium2['Year'].unique()

for facility in df_npf_uranium2['Facility Name'].unique():
    plt.plot(year, df_npf_uranium2[df_npf_uranium2['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Stack Emissions [kg]', size=12)
plt.legend(df_npf_uranium2['Facility Name'].unique(), loc='upper center')
plt.grid()

plt.show()

(9) Why does Blind River Refinery have peaks in 2013 & 2020?

In [75]:
plt.figure(figsize=(16,6))

year = df_npf_uranium['Year'].unique()

for facility in df_npf_uranium['Facility Name'].unique():
    plt.plot(year, df_npf_uranium[df_npf_uranium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Direct Discharge [kg]', size=12)
plt.legend(df_npf_uranium['Facility Name'].unique(), loc='upper center')
plt.grid()

plt.show()

(10) Why does Blind River Refinery have a peak in 2013/2014?

  • (Blind River Refinery is the only one to report values. Addressed in Question (3))

All the other substances are individual to each location:

Summary of substances per location:

  • Blind River Refinery: Estimated public dose, Radium-226, Uranium.
  • BWXT - Peterborough: Estimated public dose, Uranium.
  • BWXT - Toronto: Estimated public dose, Uranium.
  • Cameco Fuel Manufacturing: Estimated public dose, Uranium.
  • Nordion - Ottawa: Estimated public dose, Cobalt-60, Iodine-125, Iodine-131, Xenon-133, Xenon-135, Xenon-135m.
  • Port Hope Conversion Facility: Estimated public dose, Uranium.
  • SRBT: Estimated public dose, Elemental Tritium (HT), Tritium (HTO).

Radium-226¶

In [76]:
df_npf_radium = df_npf[df_npf['Substance Name (English)'] == 'Radium-226']
df_npf_radium.head()
Out[76]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
5 2013 Blind River Refinery Radium-226 MBq 0.0 1.93
26 2014 Blind River Refinery Radium-226 MBq 0.0 1.81
47 2015 Blind River Refinery Radium-226 MBq 0.0 1.06
68 2016 Blind River Refinery Radium-226 MBq 0.0 0.92
89 2017 Blind River Refinery Radium-226 MBq 0.0 1.04
In [77]:
# No Stack Emissions reported (addressed in question (2)).

plt.figure(figsize=(16,6))

plt.plot(df_npf_radium['Year'], df_npf_radium['Direct Discharge'])
plt.xticks(df_npf_radium['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Radium-226 - Direct Discharge [MBq]', size=12)
plt.legend(df_npf_radium['Facility Name'], loc='upper left')
plt.grid()

plt.show()

(11) Why does Blind River Refinery have peaks in 2013, 2019, & 2021?

Elemental Tritium (HT)¶

In [78]:
df_npf_HT = df_npf[df_npf['Substance Name (English)'] == 'Elemental Tritium (HT)']
df_npf_HT.head()
Out[78]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
18 2013 SRBT Elemental Tritium (HT) GBq 61100.0 0.0
39 2014 SRBT Elemental Tritium (HT) GBq 54800.0 0.0
60 2015 SRBT Elemental Tritium (HT) GBq 44700.0 0.0
81 2016 SRBT Elemental Tritium (HT) GBq 22700.0 0.0
102 2017 SRBT Elemental Tritium (HT) GBq 17600.0 0.0
In [79]:
# No Direct Discharge reported (addressed in question (3)).

plt.figure(figsize=(16,6))

plt.plot(df_npf_HT['Year'], df_npf_HT['Stack Emissions'])
plt.xticks(df_npf_HT['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Elemental Tritium (HT) - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_HT['Facility Name'], loc='upper right')
plt.grid()

plt.show()

(12) Why does SRBT have a spike in 2013 to 2015 with a peak in 2013?

Tritium (HTO)¶

In [80]:
df_npf_HTO = df_npf[df_npf['Substance Name (English)'] == 'Tritium (HTO)']
df_npf_HTO.head()
Out[80]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
20 2013 SRBT Tritium (HTO) GBq 17800.0 0.0
41 2014 SRBT Tritium (HTO) GBq 10700.0 0.0
62 2015 SRBT Tritium (HTO) GBq 11500.0 0.0
83 2016 SRBT Tritium (HTO) GBq 6290.0 0.0
104 2017 SRBT Tritium (HTO) GBq 7200.0 0.0
In [81]:
# No Direct Discharge reported (addressed in question (3)).

plt.figure(figsize=(16,6))

plt.plot(df_npf_HTO['Year'], df_npf_HTO['Stack Emissions'])
plt.xticks(df_npf_HTO['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_HTO['Facility Name'], loc='upper right')
plt.grid()

plt.show()

(13) Why does SRBT have a peak in 2013?

Cobalt-60¶

In [82]:
df_npf_cobalt = df_npf[df_npf['Substance Name (English)'] == 'Cobalt-60']
df_npf_cobalt.head()
Out[82]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
9 2013 Nordion - Ottawa Cobalt-60 GBq 0.0050 0.0
30 2014 Nordion - Ottawa Cobalt-60 GBq 0.0050 0.0
51 2015 Nordion - Ottawa Cobalt-60 GBq 0.0050 0.0
72 2016 Nordion - Ottawa Cobalt-60 GBq 0.0060 0.0
93 2017 Nordion - Ottawa Cobalt-60 GBq 0.0034 0.0
In [83]:
# No Direct Discharge reported (addressed in question (3)). Zero value for 2020 addressed in question (4).

plt.figure(figsize=(16,6))

plt.plot(df_npf_cobalt['Year'], df_npf_cobalt['Stack Emissions'])
plt.xticks(df_npf_cobalt['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Cobalt-60 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_cobalt['Facility Name'], loc='upper right')
plt.grid()

plt.show()

(14) Why does Nordion-Ottawa have a spike in 2013 to 2018 with a peak in 2016?

Iodine-125¶

In [84]:
df_npf_i125 = df_npf[df_npf['Substance Name (English)'] == 'Iodine-125']
df_npf_i125.head()
Out[84]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
11 2013 Nordion - Ottawa Iodine-125 GBq 0.2300 0.0
32 2014 Nordion - Ottawa Iodine-125 GBq 0.1400 0.0
53 2015 Nordion - Ottawa Iodine-125 GBq 0.1200 0.0
74 2016 Nordion - Ottawa Iodine-125 GBq 0.2100 0.0
95 2017 Nordion - Ottawa Iodine-125 GBq 0.0012 0.0
In [85]:
# No Direct Discharge reported (addressed in question (3)). Zero value for 2018 to 2021 addressed in question (4).

plt.figure(figsize=(16,6))

plt.plot(df_npf_i125['Year'], df_npf_i125['Stack Emissions'])
plt.xticks(df_npf_i125['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-125 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_i125['Facility Name'], loc='upper right')
plt.grid()

plt.show()

(15) Why does Nordion-Ottawa have a spike in 2013 to 2016 with peaks in 2013 & 2016?

Iodine-125¶

In [86]:
df_npf_i131 = df_npf[df_npf['Substance Name (English)'] == 'Iodine-131']
df_npf_i131.head()
Out[86]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
12 2013 Nordion - Ottawa Iodine-131 GBq 0.3900 0.0
33 2014 Nordion - Ottawa Iodine-131 GBq 0.4600 0.0
54 2015 Nordion - Ottawa Iodine-131 GBq 0.1500 0.0
75 2016 Nordion - Ottawa Iodine-131 GBq 0.3500 0.0
96 2017 Nordion - Ottawa Iodine-131 GBq 0.0008 0.0
In [87]:
# No Direct Discharge reported (addressed in question (3)). Zero value for 2019 to 2021 addressed in question (4).

plt.figure(figsize=(16,6))

plt.plot(df_npf_i131['Year'], df_npf_i131['Stack Emissions'])
plt.xticks(df_npf_i131['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-131 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_i131['Facility Name'], loc='upper right')
plt.grid()

plt.show()

(16) Why does Nordion-Ottawa have a spike in 2013 to 2016 with peaks in 2014 & 2016? & decreases so much after 2017?

Xenon-133¶

In [88]:
df_npf_x133 = df_npf[df_npf['Substance Name (English)'] == 'Xenon-133']
df_npf_x133.head()
Out[88]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
13 2013 Nordion - Ottawa Xenon-133 GBq 30700.0 0.0
34 2014 Nordion - Ottawa Xenon-133 GBq 15000.0 0.0
55 2015 Nordion - Ottawa Xenon-133 GBq 11900.0 0.0
76 2016 Nordion - Ottawa Xenon-133 GBq 7280.0 0.0
97 2017 Nordion - Ottawa Xenon-133 GBq 0.0 0.0
In [89]:
# No Direct Discharge reported (addressed in question (3)). Zero value for 2017 to 2021 addressed in question (4).

plt.figure(figsize=(16,6))

plt.plot(df_npf_x133['Year'], df_npf_x133['Stack Emissions'])
plt.xticks(df_npf_x133['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Xenon-133 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_x133['Facility Name'], loc='upper right')
plt.grid()

plt.show()

(17) Why does Nordion-Ottawa have a spike in 2013 to 2016 with a peak in 2013? & decreases so much after 2017?

Xenon-135¶

In [90]:
df_npf_x135 = df_npf[df_npf['Substance Name (English)'] == 'Xenon-135']
df_npf_x135.head()
Out[90]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
14 2013 Nordion - Ottawa Xenon-135 GBq 28200.0 0.0
35 2014 Nordion - Ottawa Xenon-135 GBq 13100.0 0.0
56 2015 Nordion - Ottawa Xenon-135 GBq 8240.0 0.0
77 2016 Nordion - Ottawa Xenon-135 GBq 4300.0 0.0
98 2017 Nordion - Ottawa Xenon-135 GBq 0.0 0.0
In [91]:
# No Direct Discharge reported (addressed in question (3)). Zero value for 2017 to 2021 addressed in question (4).

plt.figure(figsize=(16,6))

plt.plot(df_npf_x135['Year'], df_npf_x135['Stack Emissions'])
plt.xticks(df_npf_x135['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Xenon-135 - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_x135['Facility Name'], loc='upper right')
plt.grid()

plt.show()

(18) Why does Nordion-Ottawa have a spike in 2013 to 2016 with a peak in 2013? & decreases so much after 2017?

Xenon-135m¶

In [92]:
df_npf_x135m = df_npf[df_npf['Substance Name (English)'] == 'Xenon-135m']
df_npf_x135m.head()
Out[92]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
15 2013 Nordion - Ottawa Xenon-135m GBq 43400.0 0.0
36 2014 Nordion - Ottawa Xenon-135m GBq 18200.0 0.0
57 2015 Nordion - Ottawa Xenon-135m GBq 10800.0 0.0
78 2016 Nordion - Ottawa Xenon-135m GBq 5420.0 0.0
99 2017 Nordion - Ottawa Xenon-135m GBq 0.0 0.0
In [93]:
# No Direct Discharge reported (addressed in question (3)). Zero value for 2017 to 2021 addressed in question (4).

plt.figure(figsize=(16,6))

plt.plot(df_npf_x135m['Year'], df_npf_x135m['Stack Emissions'])
plt.xticks(df_npf_x135m['Year'])
plt.xlabel('Year', size=16)
plt.ylabel('Xenon-135m - Stack Emissions [GBq]', size=12)
plt.legend(df_npf_x135m['Facility Name'], loc='upper right')
plt.grid()

plt.show()

(19) Why does Nordion-Ottawa have a spike in 2013 to 2016 with a peak in 2013? & decreases so much after 2017?

  • It's worth to notice that all the Xenon Substances behave the same way.

Individual Plotting by Substance & Facility¶

In [94]:
facilities = df_npf['Facility Name'].unique()

for f in facilities:
    df = df_npf[df_npf['Facility Name'] == f]
    print(f,'\n')
    subs = df['Substance Name (English)'].unique()
    for s in subs:
        df2 = df[df['Substance Name (English)'] == s]
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,8))
        fig.subplots_adjust(hspace=0.5)
    
        ax1.plot(df2['Year'], df2['Stack Emissions'], color='green')
        ax1.set_title(s + ' - Stack Emissions', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax1.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax1.grid()
    
        ax2.plot(df2['Year'], df2['Direct Discharge'], color='red')
        ax2.set_title(s + ' - Direct Discharge', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax2.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax2.grid()
    
        plt.show()
BWXT - Peterborough 

BWXT - Toronto 

Blind River Refinery 

Cameco Fuel Manufacturing 

Nordion - Ottawa 

Port Hope Conversion Facility 

SRBT 

Comparison with 2020 Dataframe¶

In [95]:
df_npf_2020 = pd.read_csv("./Datasets/2020/Nuclear Processing Facilities.csv", encoding='latin1')
df_npf_2020.head()
Out[95]:
Year | Année NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Région économique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Français) Units | Unités Stack Emissions | Émissions de cheminées Direct Discharge | Évacuations directes Footnotes | Notes de bas de page
0 2020 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Uranium Uranium kg 4.8 2.8 NaN
1 2020 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Radium-226 Radium-226 MBq NRM | NRS 1.4 NaN
2 2020 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.009 NRM | NRS Estimated public dose is calculated incorporat...
3 2020 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Uranium Uranium kg 44.4 NRM | NRS NaN
4 2020 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.129 NRM | NRS Site 1, Estimated public dose is calculated in...
In [96]:
# I will remove 2021 from the new dataframe and compare the remaining with 2020.

df_npf_2021 = df_npf_0[df_npf_0['Year | Année'] != 2021].reset_index(drop = True)
df_npf_2021.head()
Out[96]:
Year | Année NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Région économique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Français) Units | Unités Stack Emissions | Émissions de cheminées Direct Discharge | Évacuations directes Footnotes | Notes de bas de page
0 2020 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Uranium Uranium kg 4.8 2.8 NaN
1 2020 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Radium-226 Radium-226 MBq NRM | NRS 1.4 NaN
2 2020 3657.0 Cameco Blind River Refinery Blind River NaN NaN NaN ON 46.1814 -83.0177 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.009 NRM | NRS Estimated public dose is calculated incorporat...
3 2020 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Uranium Uranium kg 44.4 NRM | NRS NaN
4 2020 1145.0 Cameco Port Hope Conversion Facility Port Hope NaN NaN NaN ON 43.9437 -78.2954 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.129 NRM | NRS Site 1, Estimated public dose is calculated in...
In [97]:
# I will concatenate both dataframes & keep the not duplicates to see the differences. This produces a df with 2021's values first, and 2020's values after:

df = pd.concat([df_npf_2021,df_npf_2020]).drop_duplicates(keep=False)
df

# 2 changes only.
Out[97]:
Year | Année NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Région économique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Français) Units | Unités Stack Emissions | Émissions de cheminées Direct Discharge | Évacuations directes Footnotes | Notes de bas de page
50 2018 NaN Cameco Cameco Fuel Manufacturing Port Hope NaN NaN NaN ON 43.954 -78.2748 Uranium Uranium kg 1.25 NRM | NRS NaN
72 2017 NaN Cameco Cameco Fuel Manufacturing Port Hope NaN NaN NaN ON 43.954 -78.2748 Uranium Uranium kg 0.57 NRM | NRS NaN
50 2018 NaN Cameco Cameco Fuel Manufacturing Port Hope NaN NaN NaN ON 43.954 -78.2748 Uranium Uranium kg 1.26 NRM | NRS NaN
72 2017 NaN Cameco Cameco Fuel Manufacturing Port Hope NaN NaN NaN ON 43.954 -78.2748 Uranium Uranium kg 0.58 NRM | NRS NaN

(20) Why did this 2 set of values changed between reports? Why wasn't it addressed somewhere?


Canadian Nuclear Laboratories¶

First Look at the Dataframe¶

In [98]:
df_cnl = pd.read_csv("./Datasets/Canadian Nuclear Laboratories.csv")
df_cnl
Out[98]:
Year | Année NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Région économique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Français) Units | Unités Stack Emissions | Émissions de cheminées Direct Discharge | Évacuations directes Footnotes | Notes de bas de page
0 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Elemental Tritium (HT) Tritium élémentaire Bq 2.08E+12 NRM | NRS NaN
1 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Tritium (HTO) Tritium (Eau tritiée) Bq 2.49E+13 1.50E+13 NaN
2 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Carbon-14 Carbone-14 Bq 0.00E+00 NRM | NRS NaN
3 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Total noble gases Total des gaz nobles Bq-MeV 0.00E+00 NRM | NRS NaN
4 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Iodine-125 Iode-125 Bq 1.76E+06 NRM | NRS NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
283 2013 NaN Canadian Nuclear Laboratories / Laboratoires N... Douglas Point Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3267 -81.6000 Tritium (HTO) Tritium (Eau tritiée) Bq 1.59E+11 8.73E+10 NaN
284 2013 NaN Canadian Nuclear Laboratories / Laboratoires N... Douglas Point Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3267 -81.6000 Particulate gross beta Particules bêta brutes Bq NRM | NRS 5.31E+07 NaN
285 2013 NaN Canadian Nuclear Laboratories / Laboratoires N... Douglas Point Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3267 -81.6000 Estimated public dose (see footnote) Dose estimée au public (voir note de bas de page) mSv/a 0.0013 NRM | NRS Includes the entire Bruce site. Estimated publ...
286 2013 NaN Canadian Nuclear Laboratories / Laboratoires N... Nuclear Power Demonstration Rolphton NaN NaN NaN ON 46.1868 -77.6578 Tritium (HTO) Tritium (Eau tritiée) Bq 6.86E+10 1.41E+11 NaN
287 2013 NaN Canadian Nuclear Laboratories / Laboratoires N... Nuclear Power Demonstration Rolphton NaN NaN NaN ON 46.1868 -77.6578 Particulate gross beta Particules bêta brutes Bq 6.63E+04 9.76E+05 NaN

288 rows × 17 columns

(1) Why does the Data start in 2013? Can we get older data?

In [99]:
# I'm creating a copy because I will need it later.

df_cnl_0 = df_cnl.copy()
In [100]:
df_cnl["Facility Name | Nom de l'installation"].unique()
Out[100]:
array(['Chalk River Laboratories', 'Whiteshell Laboratories',
       'Port Granby Project', 'Port Hope Project', 'Douglas Point',
       'Nuclear Power Demonstration'], dtype=object)
In [101]:
# Renaming columns to English only:

df_cnl.rename(columns={'Year | Année': 'Year', 'NPRI ID | ID INRP': 'NPRI ID','Company Name | Raison Sociale':'Company Name',"Facility Name | Nom de l'installation":'Facility Name', 'City | Ville':'City', 'CSD | SDR':'CSD','CA or CMA | AR ou RMR':'CA or CMA', 'Economic Region | Région économique':'Economic Region','Province | Province':'Province', 'Latitude | Latitude':'Latitude', 'Longitude | Longitude':'Longitude','Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)','Units | Unités':'Units', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge','Footnotes | Notes de bas de page':'Footnotes'}, inplace = True)
df_cnl.head()
Out[101]:
Year NPRI ID Company Name Facility Name City CSD CA or CMA Economic Region Province Latitude Longitude Substance Name (English) Substance Name (French) | Nom de substance (Français) Units Stack Emissions Direct Discharge Footnotes
0 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Elemental Tritium (HT) Tritium élémentaire Bq 2.08E+12 NRM | NRS NaN
1 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Tritium (HTO) Tritium (Eau tritiée) Bq 2.49E+13 1.50E+13 NaN
2 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Carbon-14 Carbone-14 Bq 0.00E+00 NRM | NRS NaN
3 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Total noble gases Total des gaz nobles Bq-MeV 0.00E+00 NRM | NRS NaN
4 2021 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Iodine-125 Iode-125 Bq 1.76E+06 NRM | NRS NaN

I noticed some values are expressed as "NRM (Not Required to Monitor)". I will summarize which values are given like that, before replacing them with zeros to be able to plot.

In [102]:
# Stack Emission column first:

df_cnl_miss_stack = df_cnl[df_cnl['Stack Emissions'] == 'NRM | NRS']
df_cnl_miss_stack[['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
Out[102]:
Year Facility Name Substance Name (English) Stack Emissions
270 2013 Chalk River Laboratories Particulate gross alpha NRM | NRS
240 2014 Chalk River Laboratories Particulate gross alpha NRM | NRS
210 2015 Chalk River Laboratories Particulate gross alpha NRM | NRS
179 2016 Chalk River Laboratories Particulate gross alpha NRM | NRS
142 2017 Chalk River Laboratories Particulate gross alpha NRM | NRS
... ... ... ... ...
14 2021 Whiteshell Laboratories Strontium-90 NRM | NRS
149 2017 Whiteshell Laboratories Uranium-total NRM | NRS
112 2018 Whiteshell Laboratories Uranium-total NRM | NRS
78 2019 Whiteshell Laboratories Uranium-total NRM | NRS
44 2020 Whiteshell Laboratories Uranium-total NRM | NRS

104 rows × 4 columns

(2) Summary of Missing Data (NRM) for Stack Emissions:

  • Chalk River Laboratories: Alpha from 2013 to 2021; Beta from 2013 to 2021; Strontium-90 from 2013 to 2021.
  • Douglas Point: Beta from 2013 to 2015.
  • Port Granby Project: Radium-226 from 2013 to 2021; Uranium from 2013 to 2021.
  • Port Hope Project: Radium-226 from 2013 to 2021; Uranium from 2013 to 2021.
  • Whiteshell Laboratories: Americium-241 from 2017 to 2020; Cesium-137 from 2013 to 2021; Plutonium-238 from 2017 to 2020; Plutonium-239/240 from 2017 to 2020; Strontium-90 from 2013 to 2021; Uranium total from 2017 to 2020.

Note: Whiteshell Laboratories has a "Uranium-total" substance. I originally thought of combining it with "Uranium", but it has different units, so I didn't.

In [103]:
# Direct Discharge column next:

df_cnl_miss_discharge = df_cnl[df_cnl['Direct Discharge'] == 'NRM | NRS']
df_cnl_miss_discharge[['Year','Facility Name', 'Substance Name (English)', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
Out[103]:
Year Facility Name Substance Name (English) Direct Discharge
268 2013 Chalk River Laboratories Argon-41 NRM | NRS
238 2014 Chalk River Laboratories Argon-41 NRM | NRS
208 2015 Chalk River Laboratories Argon-41 NRM | NRS
177 2016 Chalk River Laboratories Argon-41 NRM | NRS
141 2017 Chalk River Laboratories Argon-41 NRM | NRS
... ... ... ... ...
214 2015 Whiteshell Laboratories Tritium (HTO) NRM | NRS
183 2016 Whiteshell Laboratories Tritium (HTO) NRM | NRS
146 2017 Whiteshell Laboratories Tritium (HTO) NRM | NRS
109 2018 Whiteshell Laboratories Tritium (HTO) NRM | NRS
11 2021 Whiteshell Laboratories Tritium (HTO) NRM | NRS

116 rows × 4 columns

(3) Summary of Missing Data (NRM) for Direct Discharge:

  • Chalk River Laboratories: Argon-41 from 2013 to 2021; Carbon-14 from 2013 to 2021; Elemental Tritium from 2013 to 2021; Estimated Public Dose from 2013 to 2021; Iodine-125 from 2013 to 2021; Iodine-131 from 2013 to 2021; Noble Gases from 2013 to 2021; Xenon-133 from 2013 to 2016.
  • Douglas Point: Estimated Public Dose from 2013 to 2021; Carbon-14 2018 only (only year it reports it).
  • Nuclear Power Demonstration: Estimated Public Dose from 2014 to 2021 (doesn't report for 2013).
  • Port Granby Project: Estimated Public Dose from 2014 to 2021 (doesn't report for 2013).
  • Port Hope Project: Estimated Public Dose from 2014 to 2021 (doesn't report for 2013).
  • Whiteshell Laboratories: Estimated Public Dose from 2014 to 2021 (doesn't report for 2013); Tritium from 2013 to 2018 & 2021.
In [104]:
# I noticed some values are "0.00E+00".

df_cnl[df_cnl['Stack Emissions'] == '0.00E+00'][['Year','Facility Name', 'Substance Name (English)', 'Stack Emissions', 'Direct Discharge']].sort_values(by =['Facility Name', 'Substance Name (English)', 'Year'])
Out[104]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
70 2019 Chalk River Laboratories Argon-41 0.00E+00 NRM | NRS
36 2020 Chalk River Laboratories Argon-41 0.00E+00 NRM | NRS
6 2021 Chalk River Laboratories Argon-41 0.00E+00 NRM | NRS
2 2021 Chalk River Laboratories Carbon-14 0.00E+00 NRM | NRS
67 2019 Chalk River Laboratories Total noble gases 0.00E+00 NRM | NRS
33 2020 Chalk River Laboratories Total noble gases 0.00E+00 NRM | NRS
3 2021 Chalk River Laboratories Total noble gases 0.00E+00 NRM | NRS
24 2021 Douglas Point Particulate gross alpha 0.00E+00 5.55E+06

(4) Summary of Zero Values (all Stack Emissions):

  • Chalk River Laboratories: Argon-41 from 2019 to 2021; Carbon-14 2021 only; Noble Gases from 2019 to 2021.
  • Douglas Point: Alpha 2021 only.

Creating Geographic Reference Table¶

In [105]:
df_cnl_geography = df_cnl[['Facility Name', 'NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude']]
df_cnl_geography
Out[105]:
Facility Name NPRI ID Company Name City CSD CA or CMA Economic Region Province Latitude Longitude
0 Chalk River Laboratories 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628
1 Chalk River Laboratories 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628
2 Chalk River Laboratories 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628
3 Chalk River Laboratories 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628
4 Chalk River Laboratories 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628
... ... ... ... ... ... ... ... ... ... ...
283 Douglas Point NaN Canadian Nuclear Laboratories / Laboratoires N... Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3267 -81.6000
284 Douglas Point NaN Canadian Nuclear Laboratories / Laboratoires N... Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3267 -81.6000
285 Douglas Point NaN Canadian Nuclear Laboratories / Laboratoires N... Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3267 -81.6000
286 Nuclear Power Demonstration NaN Canadian Nuclear Laboratories / Laboratoires N... Rolphton NaN NaN NaN ON 46.1868 -77.6578
287 Nuclear Power Demonstration NaN Canadian Nuclear Laboratories / Laboratoires N... Rolphton NaN NaN NaN ON 46.1868 -77.6578

288 rows × 10 columns

In [106]:
# Cleaning the geography dataframe:

df_cnl_geography.drop_duplicates(inplace=True)
df_cnl_geography = df_cnl_geography.reset_index(drop=True)
df_cnl_geography
Out[106]:
Facility Name NPRI ID Company Name City CSD CA or CMA Economic Region Province Latitude Longitude
0 Chalk River Laboratories 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628
1 Whiteshell Laboratories 7434.0 Canadian Nuclear Laboratories / Laboratoires N... Pinawa Pinawa Division No. 1 Southeast / Sud-est MB 50.1789 -96.0604
2 Port Granby Project 30760.0 Canadian Nuclear Laboratories / Laboratoires N... Clarington NaN NaN NaN ON 43.9106 -78.4511
3 Port Hope Project 30761.0 Canadian Nuclear Laboratories / Laboratoires N... Port Hope NaN NaN NaN ON 43.9608 -78.3407
4 Douglas Point NaN Canadian Nuclear Laboratories / Laboratoires N... Tiverton Kincardine NaN Stratford--Bruce Peninsula ON 44.3267 -81.6000
5 Nuclear Power Demonstration NaN Canadian Nuclear Laboratories / Laboratoires N... Rolphton NaN NaN NaN ON 46.1868 -77.6578

Cleaning of the Dataframe¶

In [107]:
# Cleaning the NRM (Not required to monitor) values so I can convert the columns into numeric for later plotting:

df_cnl['Stack Emissions'].replace('NRM | NRS', 0, inplace=True)
df_cnl['Direct Discharge'].replace('NRM | NRS', 0, inplace=True)
In [108]:
# Replacing the "<0.01" to "0.01" to be able to transform into numeric, and correcting the "3.08+10" to "3.08E+10" & "4.43+07" to "4.43E+07" (rows 29 & 30).

df_cnl['Stack Emissions'].replace('<0.01', 0.01, inplace=True)
df_cnl['Direct Discharge'].replace('3.08+10', 3.08E+10, inplace=True)
df_cnl['Direct Discharge'].replace('4.43+07', 4.43E+07, inplace=True)
In [109]:
# Converted columns to numeric for plotting:

df_cnl['Stack Emissions'] = pd.to_numeric(df_cnl['Stack Emissions'])
df_cnl['Direct Discharge'] = pd.to_numeric(df_cnl['Direct Discharge'])
In [110]:
# I noticed "Port Hope Project" has 4 extra values in 2017 & 2018 for Radium-226 & Uranium:

df_cnl[(df_cnl['Year'].isin([2017, 2018])) & (df_cnl['Substance Name (English)'].isin(['Radium-226', 'Uranium'])) & (df_cnl['Facility Name'] == 'Port Hope Project')].sort_values(['Substance Name (English)'])[['Year', 'Facility Name', 'Substance Name (English)', 'Units', 'Stack Emissions', 'Direct Discharge', 'Footnotes']]

# Note that the 0.0 values were "NRM" before I changed them.
# Footnote explains: Releases from non-routine operations.
Out[110]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge Footnotes
122 2018 Port Hope Project Radium-226 Bq 0.0 7.000000e+05 NaN
124 2018 Port Hope Project Radium-226 Bq 0.0 5.680000e+09 Releases from non-routine operations | Rejets ...
159 2017 Port Hope Project Radium-226 Bq 0.0 8.000000e+05 NaN
161 2017 Port Hope Project Radium-226 Bq 0.0 1.590000e+10 Releases from non-routine operations | Rejets ...
123 2018 Port Hope Project Uranium kg 0.0 5.000000e-01 NaN
125 2018 Port Hope Project Uranium kg 0.0 1.460000e+01 Releases from non-routine operations | Rejets ...
160 2017 Port Hope Project Uranium kg 0.0 1.000000e-01 NaN
162 2017 Port Hope Project Uranium kg 0.0 1.101000e+02 Releases from non-routine operations | Rejets ...

(5) Why the extra Direct Discharge in Uranium & Radium-226 in 2017 & 2018?

In [111]:
df_cnl.drop(columns=['NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude', 'Substance Name (French) | Nom de substance (Français)', 'Footnotes'], inplace=True)
df_cnl.head()
Out[111]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2021 Chalk River Laboratories Elemental Tritium (HT) Bq 2.080000e+12 0.000000e+00
1 2021 Chalk River Laboratories Tritium (HTO) Bq 2.490000e+13 1.500000e+13
2 2021 Chalk River Laboratories Carbon-14 Bq 0.000000e+00 0.000000e+00
3 2021 Chalk River Laboratories Total noble gases Bq-MeV 0.000000e+00 0.000000e+00
4 2021 Chalk River Laboratories Iodine-125 Bq 1.760000e+06 0.000000e+00
In [112]:
# I'm going to aggregate the extra Port Hope values, to be able to plot:

df_cnl = df_cnl.groupby(['Year', 'Facility Name', 'Substance Name (English)', 'Units'],as_index=False).agg({'Stack Emissions': 'sum', 'Direct Discharge': 'sum'})
df_cnl[(df_cnl['Year'].isin([2017, 2018])) & (df_cnl['Substance Name (English)'].isin(['Radium-226', 'Uranium'])) & (df_cnl['Facility Name'] == 'Port Hope Project')]
df_cnl.head()
Out[112]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2013 Chalk River Laboratories Argon-41 Bq 8.460000e+15 0.0
1 2013 Chalk River Laboratories Carbon-14 Bq 5.740000e+11 0.0
2 2013 Chalk River Laboratories Elemental Tritium (HT) Bq 1.590000e+12 0.0
3 2013 Chalk River Laboratories Estimated public dose (see footnote) mSv/a 5.914000e-02 0.0
4 2013 Chalk River Laboratories Iodine-125 Bq 2.840000e+08 0.0
In [113]:
# I'm saving the clean dataframe to do a dashboard in Tableau.

df_cnl.to_csv(".\Datasets\df_cnl.csv", index=True, header=True)

Plotting by Substance¶

In [114]:
df_cnl['Substance Name (English)'].unique()
Out[114]:
array(['Argon-41', 'Carbon-14', 'Elemental Tritium (HT)',
       'Estimated public dose (see footnote)', 'Iodine-125', 'Iodine-131',
       'Particulate gross alpha', 'Particulate gross beta',
       'Strontium-90', 'Total noble gases', 'Tritium (HTO)', 'Xenon-133',
       'Radium-226', 'Uranium', 'Cesium-137', 'Americium-241',
       'Plutonium-238', 'Plutonium-239/240', 'Uranium-total'],
      dtype=object)

Carbon-14¶

In [115]:
df_cnl_c14 = df_cnl[df_cnl['Substance Name (English)'] == 'Carbon-14']
df_cnl_c14
Out[115]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
1 2013 Chalk River Laboratories Carbon-14 Bq 5.740000e+11 0.0
27 2014 Chalk River Laboratories Carbon-14 Bq 8.690000e+11 0.0
57 2015 Chalk River Laboratories Carbon-14 Bq 3.770000e+11 0.0
87 2016 Chalk River Laboratories Carbon-14 Bq 4.850000e+11 0.0
118 2017 Chalk River Laboratories Carbon-14 Bq 4.910000e+11 0.0
152 2018 Chalk River Laboratories Carbon-14 Bq 2.590000e+11 0.0
162 2018 Douglas Point Carbon-14 Bq 1.510000e+09 0.0
187 2019 Chalk River Laboratories Carbon-14 Bq 3.440000e+10 0.0
221 2020 Chalk River Laboratories Carbon-14 Bq 2.610000e+10 0.0
255 2021 Chalk River Laboratories Carbon-14 Bq 0.000000e+00 0.0

(6) Why does Douglas Point only report Carbon-14 Emissions for 2018?

In [116]:
# No Direct Discharge reported (addressed in question (3)). Removing Douglas to be able to plot.

df_cnl_c14 = df_cnl_c14[df_cnl_c14['Facility Name'] != 'Douglas Point']

plt.figure(figsize=(16,6))

year = df_cnl_c14['Year'].unique()

for facility in df_cnl_c14['Facility Name'].unique():
    plt.plot(year, df_cnl_c14[df_cnl_c14['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Carbon-14 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_c14['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(7) Why does Chalk River Laboratories have a peak in 2014 & much lower values since 2019 reaching 0 in 2021?

Estimated public dose¶

In [117]:
df_cnl_epd = df_cnl[df_cnl['Substance Name (English)'] == 'Estimated public dose (see footnote)']
df_cnl_epd.head()
Out[117]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
3 2013 Chalk River Laboratories Estimated public dose (see footnote) mSv/a 0.05914 0.0
12 2013 Douglas Point Estimated public dose (see footnote) mSv/a 0.00130 0.0
29 2014 Chalk River Laboratories Estimated public dose (see footnote) mSv/a 0.06000 0.0
38 2014 Douglas Point Estimated public dose (see footnote) mSv/a 0.00200 0.0
41 2014 Nuclear Power Demonstration Estimated public dose (see footnote) mSv/a 0.01000 0.0
In [118]:
# Whiteshell Laboratories, Port Granby Project, Port Hope Project, Nuclear Power Demonstration are missing 2013. I will have to plot them separately.

# No Direct Discharge reported (addressed in question (3)).

df_cnl_epd2 = df_cnl_epd[df_cnl_epd['Facility Name'].isin(['Chalk River Laboratories', 'Douglas Point'])]

plt.figure(figsize=(16,6))

year = df_cnl_epd2['Year'].unique()

for facility in df_cnl_epd2['Facility Name'].unique():
    plt.plot(year, df_cnl_epd2[df_cnl_epd2['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Estimated public dose - Stack Emissions [mSv/a]', size=12)
plt.legend(df_cnl_epd2['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(8) Why does Chalk River Laboratories have a spike in 2015 to 2017 & then plummets?

In [119]:
# No Direct Discharge reported (addressed in question (3)).

df_cnl_epd3 = df_cnl_epd[df_cnl_epd['Facility Name'].isin(['Whiteshell Laboratories', 'Port Granby Project', 'Port Hope Project', 'Nuclear Power Demonstration'])]

plt.figure(figsize=(16,6))

year = df_cnl_epd3['Year'].unique()

for facility in df_cnl_epd3['Facility Name'].unique():
    plt.plot(year, df_cnl_epd3[df_cnl_epd3['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Estimated public dose - Stack Emissions [mSv/a]', size=12)
plt.legend(df_cnl_epd3['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(9) Why Nuclear Power Demonstration, Port Granby, Port Hope, & Whiteshell Laboratories don't report anything for 2013?

(10) Why does Port Hope have a peak in 2015 & a spike between 2018 & 2020?

(11) Why does Port Granby have a peak in 2019?

Particulate gross alpha¶

In [120]:
df_cnl_alpha = df_cnl[df_cnl['Substance Name (English)'] == 'Particulate gross alpha']
df_cnl_alpha.head()
Out[120]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
6 2013 Chalk River Laboratories Particulate gross alpha Bq 0.0 46800000.0
22 2013 Whiteshell Laboratories Particulate gross alpha Bq 92400.0 114000000.0
32 2014 Chalk River Laboratories Particulate gross alpha Bq 0.0 907000000.0
52 2014 Whiteshell Laboratories Particulate gross alpha Bq 88200.0 47600000.0
62 2015 Chalk River Laboratories Particulate gross alpha Bq 0.0 694000000.0
In [121]:
# Douglas Point starts in 2016. I will have to plot it separately.

# Chalk River Laboratories Stack Emissions are 0, but it was previously 'NRM' (addressed in question (2)).

df_cnl_alpha2 = df_cnl_alpha[df_cnl_alpha['Facility Name'] != 'Douglas Point']
                             
plt.figure(figsize=(16,6))

year = df_cnl_alpha2['Year'].unique()

for facility in df_cnl_alpha2['Facility Name'].unique():
    plt.plot(year, df_cnl_alpha2[df_cnl_alpha2['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_alpha2['Facility Name'].unique(), loc='center left')
plt.grid()

plt.show()
In [122]:
df_cnl_alpha3 = df_cnl_alpha[df_cnl_alpha['Facility Name'] == 'Douglas Point']
                             
plt.figure(figsize=(16,6))

year = df_cnl_alpha3['Year'].unique()

for facility in df_cnl_alpha3['Facility Name'].unique():
    plt.plot(year, df_cnl_alpha3[df_cnl_alpha3['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_alpha3['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(12) Why Douglas Point doesn't report anything for 2013 to 2015? (both stack & direct)

(13) Why does Douglas Point have a peak in 2020?

  • The zero value in 2021 was addressed in question (4).
In [123]:
plt.figure(figsize=(16,6))

year = df_cnl_alpha2['Year'].unique()

for facility in df_cnl_alpha2['Facility Name'].unique():
    plt.plot(year, df_cnl_alpha2[df_cnl_alpha2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_alpha2['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(14) Why is Chalk River so much higher than Whiteshell Laboratories & has a peak in 2014?

In [124]:
plt.figure(figsize=(16,6))

year = df_cnl_alpha3['Year'].unique()

for facility in df_cnl_alpha3['Facility Name'].unique():
    plt.plot(year, df_cnl_alpha3[df_cnl_alpha3['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross alpha - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_alpha3['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(15) Why does Douglas Point has a peak in 2017 & 2018?

Particulate gross beta¶

In [125]:
df_cnl_beta = df_cnl[df_cnl['Substance Name (English)'] == 'Particulate gross beta']
df_cnl_beta.head()
Out[125]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
7 2013 Chalk River Laboratories Particulate gross beta Bq 0.0 3.020000e+09
13 2013 Douglas Point Particulate gross beta Bq 0.0 5.310000e+07
15 2013 Nuclear Power Demonstration Particulate gross beta Bq 66300.0 9.760000e+05
23 2013 Whiteshell Laboratories Particulate gross beta Bq 229000.0 3.860000e+08
33 2014 Chalk River Laboratories Particulate gross beta Bq 0.0 2.620000e+11
In [126]:
# Chalk River Laboratories & Douglas Point Stack Emissions are 0, but it was previously 'NRM' (addressed in question (2)).

plt.figure(figsize=(16,6))

year = df_cnl_beta['Year'].unique()

for facility in df_cnl_beta['Facility Name'].unique():
    plt.plot(year, df_cnl_beta[df_cnl_beta['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross beta - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_beta['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(16) Why does Whiteshell Laboratories has higher emissions? & with peaks in 2014 & 2019?

(17) Why does Nuclear Power Demonstration has a peak in 2017?

(18) Why does Douglas Point has a peak in 2020?

In [127]:
plt.figure(figsize=(16,6))

year = df_cnl_beta['Year'].unique()

for facility in df_cnl_beta['Facility Name'].unique():
    plt.plot(year, df_cnl_beta[df_cnl_beta['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross beta - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_beta['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(19) Why does Chalk River Laboratories has higher discharge? & with a peak in 2014?

In [128]:
# I'm plotting without Chalk River to see the rest.

df_cnl_beta2 = df_cnl_beta[df_cnl_beta['Facility Name'] != 'Chalk River Laboratories']

plt.figure(figsize=(16,6))

year = df_cnl_beta['Year'].unique()

for facility in df_cnl_beta2['Facility Name'].unique():
    plt.plot(year, df_cnl_beta2[df_cnl_beta2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Particulate gross beta - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_beta2['Facility Name'].unique(), loc='best')
plt.grid()

plt.show()

(20) Why does Whiteshell Laboratories has a peak in 2013 & a spike 2019/2020/2021?

(21) Why does Nuclear Power Demonstration has peaks in 2017 & 2020?

Radium-226¶

In [129]:
df_cnl_radium = df_cnl[df_cnl['Substance Name (English)'] == 'Radium-226']
df_cnl_radium.head()
Out[129]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
17 2013 Port Granby Project Radium-226 Bq 0.0 5000000.0
19 2013 Port Hope Project Radium-226 Bq 0.0 6200000.0
45 2014 Port Granby Project Radium-226 Bq 0.0 5400000.0
48 2014 Port Hope Project Radium-226 Bq 0.0 7700000.0
75 2015 Port Granby Project Radium-226 Bq 0.0 4600000.0
In [130]:
# No Stack Emissions reported (addressed in question (2)).

plt.figure(figsize=(16,6))

year = df_cnl_radium['Year'].unique()

for facility in df_cnl_radium['Facility Name'].unique():
    plt.plot(year, df_cnl_radium[df_cnl_radium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Radium-226 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_radium['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(22) Why does Port Hope has a huge peak in 2017/2018 (x10,000)? Looking at the data, it comes from the "releases from non-routine operations".

  • Addressed in question (5).

Strontium-90¶

In [131]:
df_cnl_strontium = df_cnl[df_cnl['Substance Name (English)'] == 'Strontium-90']
df_cnl_strontium.head()
Out[131]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
8 2013 Chalk River Laboratories Strontium-90 Bq 0.0 1.510000e+10
24 2013 Whiteshell Laboratories Strontium-90 Bq 0.0 6.970000e+07
34 2014 Chalk River Laboratories Strontium-90 Bq 0.0 2.260000e+11
54 2014 Whiteshell Laboratories Strontium-90 Bq 0.0 6.610000e+07
64 2015 Chalk River Laboratories Strontium-90 Bq 0.0 6.700000e+10
In [132]:
# No Stack Emissions reported (addressed in question (2)).

plt.figure(figsize=(16,6))

year = df_cnl_strontium['Year'].unique()

for facility in df_cnl_strontium['Facility Name'].unique():
    plt.plot(year, df_cnl_strontium[df_cnl_strontium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Strontium-90 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_strontium['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(23) Why does Chalk River Laboratories has a peak in 2014?

Tritium (HTO)¶

In [133]:
df_cnl_hto = df_cnl[df_cnl['Substance Name (English)'] == 'Tritium (HTO)']
df_cnl_hto.head()
Out[133]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
10 2013 Chalk River Laboratories Tritium (HTO) Bq 2.460000e+14 2.430000e+12
14 2013 Douglas Point Tritium (HTO) Bq 1.590000e+11 8.730000e+10
16 2013 Nuclear Power Demonstration Tritium (HTO) Bq 6.860000e+10 1.410000e+11
25 2013 Whiteshell Laboratories Tritium (HTO) Bq 3.520000e+10 0.000000e+00
36 2014 Chalk River Laboratories Tritium (HTO) Bq 2.600000e+14 3.070000e+13
In [134]:
plt.figure(figsize=(16,6))

year = df_cnl_hto['Year'].unique()

for facility in df_cnl_hto['Facility Name'].unique():
    plt.plot(year, df_cnl_hto[df_cnl_hto['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_hto['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(24) Why does Chalk River Laboratories has higher emissions? & then comes down in 2020?

In [135]:
# I'm plotting without Chalk River to see the rest.

df_cnl_hto2 = df_cnl_hto[df_cnl_hto['Facility Name'] != 'Chalk River Laboratories']

plt.figure(figsize=(16,6))

year = df_cnl_hto2['Year'].unique()

for facility in df_cnl_hto2['Facility Name'].unique():
    plt.plot(year, df_cnl_hto2[df_cnl_hto2['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_hto2['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(25) Why does Nuclear Power Demonstration has a peak in 2017?

(26) Why does Douglas Point has a peak in 2018?

In [136]:
# No Direct Discharge reported for Whiteshell Laboratories (NRM, addressed in question (3)).

plt.figure(figsize=(16,6))

year = df_cnl_hto['Year'].unique()

for facility in df_cnl_hto['Facility Name'].unique():
    plt.plot(year, df_cnl_hto[df_cnl_hto['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_hto['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(27) Why does Chalk River Laboratories has higher discharge? With a spike from 2014 to 2017?

In [137]:
# I'm plotting without Chalk River to see the rest.

plt.figure(figsize=(16,6))

year = df_cnl_hto2['Year'].unique()

for facility in df_cnl_hto2['Facility Name'].unique():
    plt.plot(year, df_cnl_hto2[df_cnl_hto2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Tritium (HTO) - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_hto2['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(28) Why does Nuclear Power Demonstration has a spike in 2013 to 2015?

(29) Why does Douglas Point has a peak in 2013?

Uranium¶

In [138]:
df_cnl_uranium = df_cnl[df_cnl['Substance Name (English)'] == 'Uranium']
df_cnl_uranium.head()
Out[138]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
18 2013 Port Granby Project Uranium kg 0.0 49.2
20 2013 Port Hope Project Uranium kg 0.0 25.4
46 2014 Port Granby Project Uranium kg 0.0 36.7
49 2014 Port Hope Project Uranium kg 0.0 23.0
76 2015 Port Granby Project Uranium kg 0.0 29.0
In [139]:
# No Stack Emissions reported (addressed in question (2)).

plt.figure(figsize=(16,6))

year = df_cnl_uranium['Year'].unique()

for facility in df_cnl_uranium['Facility Name'].unique():
    plt.plot(year, df_cnl_uranium[df_cnl_uranium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Direct Discharge [kg]', size=12)
plt.legend(df_cnl_uranium['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(30) Why does Port Granby has a spike from 2013 to 2016?

(31) Why does Port Hope has a peak in 2017? Looking at the data, it comes from the "releases from non-routine operations" & Why does it comes down so much after 2018?

Addressed in question (5).

All the other substances are individual to each location:

Summary of substances per location:

  • Whiteshell Laboratories: Americium-241, Cesium-137, Plutonium-238, Plutonium-239/240, Uranium total.
  • Chalk River Laboratories: Argon-41, Elemental Tritium (HT), Iodine-125, Iodine-131, Total noble gases, Xenon-133.

Americium-241¶

In [140]:
df_cnl_a241 = df_cnl[df_cnl['Substance Name (English)'] == 'Americium-241']
df_cnl_a241.head()
Out[140]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
141 2017 Whiteshell Laboratories Americium-241 Bq 0.0 5100000.0
176 2018 Whiteshell Laboratories Americium-241 Bq 0.0 4210000.0
210 2019 Whiteshell Laboratories Americium-241 Bq 0.0 20100000.0
244 2020 Whiteshell Laboratories Americium-241 Bq 0.0 18000000.0
In [141]:
# No Stack Emissions reported (addressed in question (2)). Missing Direct Discharge values from 2013 to 2016.

plt.figure(figsize=(16,6))

year = df_cnl_a241['Year'].unique()

for facility in df_cnl_a241['Facility Name'].unique():
    plt.plot(year, df_cnl_a241[df_cnl_a241['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Americium-241 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_a241['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(32) Why wasn't it reported from 2013 to 2016 & 2021?

(33) What happened in 2019 & 2020 that the discharge increased so much?

Argon-41¶

In [142]:
df_cnl_argon = df_cnl[df_cnl['Substance Name (English)'] == 'Argon-41']
df_cnl_argon.head()
Out[142]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
0 2013 Chalk River Laboratories Argon-41 Bq 8.460000e+15 0.0
26 2014 Chalk River Laboratories Argon-41 Bq 9.370000e+15 0.0
56 2015 Chalk River Laboratories Argon-41 Bq 1.290000e+16 0.0
86 2016 Chalk River Laboratories Argon-41 Bq 1.070000e+16 0.0
117 2017 Chalk River Laboratories Argon-41 Bq 1.160000e+16 0.0
In [143]:
# No Direct Discharge reported (addressed in question (3)). Zero values for 2019 - 2021 (not NRM, addressed in question (4))

plt.figure(figsize=(16,6))

year = df_cnl_argon['Year'].unique()

for facility in df_cnl_argon['Facility Name'].unique():
    plt.plot(year, df_cnl_argon[df_cnl_argon['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Argon-41 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_argon['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(34) Why does Chalk River report zero from 2019 to 2021?

  • Addressed in question (4) already.

Cesium-137¶

In [144]:
df_cnl_cesium = df_cnl[df_cnl['Substance Name (English)'] == 'Cesium-137']
df_cnl_cesium.head()
Out[144]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
21 2013 Whiteshell Laboratories Cesium-137 Bq 0.0 64000000.0
50 2014 Whiteshell Laboratories Cesium-137 Bq 0.0 26600000.0
80 2015 Whiteshell Laboratories Cesium-137 Bq 0.0 16500000.0
111 2016 Whiteshell Laboratories Cesium-137 Bq 0.0 12800000.0
142 2017 Whiteshell Laboratories Cesium-137 Bq 0.0 18900000.0
In [145]:
# No Stack Emissions reported (addressed in question (2)).

plt.figure(figsize=(16,6))

year = df_cnl_cesium['Year'].unique()

for facility in df_cnl_cesium['Facility Name'].unique():
    plt.plot(year, df_cnl_cesium[df_cnl_cesium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Cesium-137 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_cesium['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(35) Why does Whiteshell Laboratories has a peak in 2013?

Elemental Tritium (HT)¶

In [146]:
df_cnl_ht = df_cnl[df_cnl['Substance Name (English)'] == 'Elemental Tritium (HT)']
df_cnl_ht.head()
Out[146]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
2 2013 Chalk River Laboratories Elemental Tritium (HT) Bq 1.590000e+12 0.0
28 2014 Chalk River Laboratories Elemental Tritium (HT) Bq 1.370000e+12 0.0
58 2015 Chalk River Laboratories Elemental Tritium (HT) Bq 4.770000e+12 0.0
88 2016 Chalk River Laboratories Elemental Tritium (HT) Bq 2.550000e+12 0.0
119 2017 Chalk River Laboratories Elemental Tritium (HT) Bq 4.640000e+12 0.0
In [147]:
# No Direct Discharge reported (addressed in question (3)).

plt.figure(figsize=(16,6))

year = df_cnl_ht['Year'].unique()

for facility in df_cnl_ht['Facility Name'].unique():
    plt.plot(year, df_cnl_ht[df_cnl_ht['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Elemental Tritium (HT) - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_ht['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(36) Why does Chalk River Laboratories has a peak in 2015 & a spike from 2017 to 2020 with a peak in 2018?

Iodine-125¶

In [148]:
df_cnl_i125 = df_cnl[df_cnl['Substance Name (English)'] == 'Iodine-125']
df_cnl_i125.head()
Out[148]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
4 2013 Chalk River Laboratories Iodine-125 Bq 284000000.0 0.0
30 2014 Chalk River Laboratories Iodine-125 Bq 162000000.0 0.0
60 2015 Chalk River Laboratories Iodine-125 Bq 344000000.0 0.0
90 2016 Chalk River Laboratories Iodine-125 Bq 291000000.0 0.0
121 2017 Chalk River Laboratories Iodine-125 Bq 530000000.0 0.0
In [149]:
# No Direct Discharge reported (addressed in question (3)).

plt.figure(figsize=(16,6))

year = df_cnl_i125['Year'].unique()

for facility in df_cnl_i125['Facility Name'].unique():
    plt.plot(year, df_cnl_i125[df_cnl_i125['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-125 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_i125['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(37) Why does Chalk River Laboratories has a peak in 2017? & plummets since 2018?

Iodine-131¶

In [150]:
df_cnl_i131 = df_cnl[df_cnl['Substance Name (English)'] == 'Iodine-131']
df_cnl_i131.head()
Out[150]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
5 2013 Chalk River Laboratories Iodine-131 Bq 1.380000e+11 0.0
31 2014 Chalk River Laboratories Iodine-131 Bq 2.060000e+11 0.0
61 2015 Chalk River Laboratories Iodine-131 Bq 1.030000e+11 0.0
91 2016 Chalk River Laboratories Iodine-131 Bq 5.170000e+10 0.0
122 2017 Chalk River Laboratories Iodine-131 Bq 3.780000e+08 0.0
In [151]:
# No Direct Discharge reported (addressed in question (3)).

plt.figure(figsize=(16,6))

year = df_cnl_i131['Year'].unique()

for facility in df_cnl_i131['Facility Name'].unique():
    plt.plot(year, df_cnl_i131[df_cnl_i131['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Iodine-131 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_i131['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(38) Why does Chalk River Laboratories has a peak in 2014? & comes down so much since 2017?

Plutonium-238¶

In [152]:
df_cnl_p238 = df_cnl[df_cnl['Substance Name (English)'] == 'Plutonium-238']
df_cnl_p238.head()
Out[152]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
146 2017 Whiteshell Laboratories Plutonium-238 Bq 0.0 8690000.0
181 2018 Whiteshell Laboratories Plutonium-238 Bq 0.0 18400000.0
215 2019 Whiteshell Laboratories Plutonium-238 Bq 0.0 48600000.0
249 2020 Whiteshell Laboratories Plutonium-238 Bq 0.0 23900000.0
In [153]:
# No Stack Emissions reported (addressed in question (2)).

plt.figure(figsize=(16,6))

year = df_cnl_p238['Year'].unique()

for facility in df_cnl_p238['Facility Name'].unique():
    plt.plot(year, df_cnl_p238[df_cnl_p238['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Plutonium-238 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_p238['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(39) Why does Whiteshell Laboratories doesn't report values from 2013 to 2016 & on 2021?

(40) Why does Whiteshell Laboratories has a peak in 2019?

Plutonium-239/240¶

In [154]:
df_cnl_p239 = df_cnl[df_cnl['Substance Name (English)'] == 'Plutonium-239/240']
df_cnl_p239.head()
Out[154]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
147 2017 Whiteshell Laboratories Plutonium-239/240 Bq 0.0 12000000.0
182 2018 Whiteshell Laboratories Plutonium-239/240 Bq 0.0 23200000.0
216 2019 Whiteshell Laboratories Plutonium-239/240 Bq 0.0 47000000.0
250 2020 Whiteshell Laboratories Plutonium-239/240 Bq 0.0 39400000.0
In [155]:
# No Stack Emissions reported (addressed in question (2)).

plt.figure(figsize=(16,6))

year = df_cnl_p239['Year'].unique()

for facility in df_cnl_p239['Facility Name'].unique():
    plt.plot(year, df_cnl_p239[df_cnl_p239['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Plutonium-239/240 - Direct Discharge [Bq]', size=12)
plt.legend(df_cnl_p239['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(41) Why does Whiteshell Laboratories doesn't report values from 2013 to 2016 & on 2021?

(42) Why does Whiteshell Laboratories has a peak in 2019?

  • Note: Exactly the same as Plutonium-238.

Total noble gases¶

In [156]:
df_cnl_noble = df_cnl[df_cnl['Substance Name (English)'] == 'Total noble gases']
df_cnl_noble.head()
Out[156]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
9 2013 Chalk River Laboratories Total noble gases Bq-MeV 1.320000e+15 0.0
35 2014 Chalk River Laboratories Total noble gases Bq-MeV 2.110000e+15 0.0
65 2015 Chalk River Laboratories Total noble gases Bq-MeV 1.200000e+15 0.0
95 2016 Chalk River Laboratories Total noble gases Bq-MeV 3.970000e+14 0.0
126 2017 Chalk River Laboratories Total noble gases Bq-MeV 6.500000e+12 0.0
In [157]:
# No Direct Discharge reported (addressed in question (3)).

plt.figure(figsize=(16,6))

year = df_cnl_noble['Year'].unique()

for facility in df_cnl_noble['Facility Name'].unique():
    plt.plot(year, df_cnl_noble[df_cnl_noble['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Total noble gases - Stack Emissions [Bq-MeV]', size=12)
plt.legend(df_cnl_noble['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(43) Why does Chalk River Laboratories has a peak in 2014? & comes down so much since 2017? & becomes zero since 2019?

  • Note: Exactly same pattern as Iodine-131.

Xenon-133¶

In [158]:
df_cnl_xenon = df_cnl[df_cnl['Substance Name (English)'] == 'Xenon-133']
df_cnl_xenon.head()
Out[158]:
Year Facility Name Substance Name (English) Units Stack Emissions Direct Discharge
11 2013 Chalk River Laboratories Xenon-133 Bq 5.720000e+15 0.0
37 2014 Chalk River Laboratories Xenon-133 Bq 8.680000e+15 0.0
67 2015 Chalk River Laboratories Xenon-133 Bq 4.890000e+15 0.0
97 2016 Chalk River Laboratories Xenon-133 Bq 3.120000e+15 0.0
In [159]:
# No Direct Discharge reported (addressed in question (3)).

plt.figure(figsize=(16,6))

year = df_cnl_xenon['Year'].unique()

for facility in df_cnl_xenon['Facility Name'].unique():
    plt.plot(year, df_cnl_xenon[df_cnl_xenon['Facility Name'] == facility]['Stack Emissions'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Xenon-133 - Stack Emissions [Bq]', size=12)
plt.legend(df_cnl_xenon['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(44) Why does Chalk River Laboratories doesn't report values since 2017?

(45) Why does Chalk River Laboratories has a peak in 2014?

Individual Plotting by Substance & Facility¶

In [160]:
facilities = df_cnl['Facility Name'].unique()

for f in facilities:
    df = df_cnl[df_cnl['Facility Name'] == f]
    print(f,'\n')
    subs = df['Substance Name (English)'].unique()
    for s in subs:
        df2 = df[df['Substance Name (English)'] == s]
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,8))
        fig.subplots_adjust(hspace=0.5)
    
        ax1.plot(df2['Year'], df2['Stack Emissions'], color='green')
        ax1.set_title(s + ' - Stack Emissions', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax1.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax1.grid()
    
        ax2.plot(df2['Year'], df2['Direct Discharge'], color='red')
        ax2.set_title(s + ' - Direct Discharge', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax2.set_xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        ax2.grid()
    
        plt.show()
Chalk River Laboratories 

Douglas Point 

Nuclear Power Demonstration 

Port Granby Project 

Port Hope Project 

Whiteshell Laboratories 

Comparison with 2020 Dataframe¶

In [161]:
df_cnl_2020 = pd.read_csv("./Datasets/2020/Canadian Nuclear Laboratories.csv", encoding='latin1')
df_cnl_2020.head()
Out[161]:
Year | Année NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Région économique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Français) Units | Unités Stack Emissions | Émissions de cheminées Direct Discharge | Évacuations directes Footnotes | Notes de bas de page
0 2020 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Elemental Tritium (HT) Tritium élémentaire Bq 5.06E+12 NRM | NRS NaN
1 2020 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Tritium (HTO) Tritium (Eau tritiée) Bq 2.54E+13 1.08E+13 NaN
2 2020 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Carbon-14 Carbone-14 Bq 2.61E+10 NRM | NRS NaN
3 2020 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Total noble gases Total des gaz nobles Bq-MeV 0.00E+00 NRM | NRS NaN
4 2020 3147.0 Canadian Nuclear Laboratories / Laboratoires N... Chalk River Laboratories Chalk River Deep River Renfrew Kingston--Pembroke ON 46.0554 -77.3628 Iodine-131 Iode-131 Bq 2.44E+07 NRM | NRS NaN
In [162]:
# I will leave only the essential columns, as I saw errors in the other ones (in NPRI ID for example) and those are not important.

df_cnl_2021 = df_cnl_0[['Year | Année', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Émissions de cheminées',
       'Direct Discharge | Évacuations directes']]
df_cnl_2021.rename(columns={'Year | Année': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge'}, inplace = True)
df_cnl_2021.head()
Out[162]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2021 Chalk River Laboratories Elemental Tritium (HT) 2.08E+12 NRM | NRS
1 2021 Chalk River Laboratories Tritium (HTO) 2.49E+13 1.50E+13
2 2021 Chalk River Laboratories Carbon-14 0.00E+00 NRM | NRS
3 2021 Chalk River Laboratories Total noble gases 0.00E+00 NRM | NRS
4 2021 Chalk River Laboratories Iodine-125 1.76E+06 NRM | NRS
In [163]:
# I will remove 2021 from the new dataframe and compare the remaining with 2020.

df_cnl_2021 = df_cnl_2021[df_cnl_2021['Year'] != 2021].reset_index(drop = True)
df_cnl_2021.head()
Out[163]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2020 Chalk River Laboratories Elemental Tritium (HT) 5.06E+12 NRM | NRS
1 2020 Chalk River Laboratories Tritium (HTO) 2.54E+13 1.08E+13
2 2020 Chalk River Laboratories Carbon-14 2.61E+10 NRM | NRS
3 2020 Chalk River Laboratories Total noble gases 0.00E+00 NRM | NRS
4 2020 Chalk River Laboratories Iodine-125 2.00E+06 NRM | NRS
In [164]:
# I will leave only the essential columns, as I saw errors in the other ones (in NPRI ID for example) and those are not important.

df_cnl_2020 = df_cnl_2020[['Year | Année', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Émissions de cheminées',
       'Direct Discharge | Évacuations directes']]
df_cnl_2020.rename(columns={'Year | Année': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge'}, inplace = True)
df_cnl_2020.head()
Out[164]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2020 Chalk River Laboratories Elemental Tritium (HT) 5.06E+12 NRM | NRS
1 2020 Chalk River Laboratories Tritium (HTO) 2.54E+13 1.08E+13
2 2020 Chalk River Laboratories Carbon-14 2.61E+10 NRM | NRS
3 2020 Chalk River Laboratories Total noble gases 0.00E+00 NRM | NRS
4 2020 Chalk River Laboratories Iodine-131 2.44E+07 NRM | NRS
In [165]:
# I will concatenate both dataframes & keep the not duplicates to see the differences. This produces a df with 2021's values first, and 2020's values after:

df = pd.concat([df_cnl_2021,df_cnl_2020]).drop_duplicates(keep=False)
df = df.sort_values(by=['Facility Name', 'Substance Name (English)', 'Year'])
df

# I will explore the differences:
Out[165]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
74 2018 Chalk River Laboratories Argon-41 2.64E+15 NRM | NRS
69 2018 Chalk River Laboratories Argon-41 2.59E+15 NRM | NRS
143 2016 Chalk River Laboratories Carbon-14 4.85E+11 NRM | NRS
135 2016 Chalk River Laboratories Carbon-14 4.84E+11 NRM | NRS
107 2017 Chalk River Laboratories Carbon-14 4.91E+11 NRM | NRS
101 2017 Chalk River Laboratories Carbon-14 4.90E+11 NRM | NRS
70 2018 Chalk River Laboratories Carbon-14 2.59E+11 NRM | NRS
66 2018 Chalk River Laboratories Carbon-14 2.54E+11 NRM | NRS
44 2019 Chalk River Laboratories Estimated public dose (see footnote) 0.0039 NRM | NRS
40 2019 Chalk River Laboratories Estimated public dose (see footnote) 0.0038 NRM | NRS
10 2020 Chalk River Laboratories Estimated public dose (see footnote) 0.0074 NRM | NRS
8 2020 Chalk River Laboratories Estimated public dose (see footnote) 0.0059 NRM | NRS
236 2013 Chalk River Laboratories Iodine-125 2.84E+08 NRM | NRS
206 2014 Chalk River Laboratories Iodine-125 1.62E+08 NRM | NRS
176 2015 Chalk River Laboratories Iodine-125 3.44E+08 NRM | NRS
145 2016 Chalk River Laboratories Iodine-125 2.91E+08 NRM | NRS
109 2017 Chalk River Laboratories Iodine-125 5.30E+08 NRM | NRS
72 2018 Chalk River Laboratories Iodine-125 9.67E+07 NRM | NRS
38 2019 Chalk River Laboratories Iodine-125 2.44E+06 NRM | NRS
4 2020 Chalk River Laboratories Iodine-125 2.00E+06 NRM | NRS
110 2017 Chalk River Laboratories Iodine-131 3.78E+08 NRM | NRS
103 2017 Chalk River Laboratories Iodine-131 3.82E+08 NRM | NRS
73 2018 Chalk River Laboratories Iodine-131 1.05E+08 NRM | NRS
68 2018 Chalk River Laboratories Iodine-131 1.02E+08 NRM | NRS
242 2013 Chalk River Laboratories Strontium-90 NRM | NRS 1.51E+10
212 2014 Chalk River Laboratories Strontium-90 NRM | NRS 2.26E+11
182 2015 Chalk River Laboratories Strontium-90 NRM | NRS 6.70E+10
151 2016 Chalk River Laboratories Strontium-90 NRM | NRS 7.30E+09
114 2017 Chalk River Laboratories Strontium-90 NRM | NRS 1.66E+10
77 2018 Chalk River Laboratories Strontium-90 NRM | NRS 8.72E+09
43 2019 Chalk River Laboratories Strontium-90 NRM | NRS 2.57E+09
9 2020 Chalk River Laboratories Strontium-90 NRM | NRS 7.07E+09
144 2016 Chalk River Laboratories Total noble gases 3.97E+14 NRM | NRS
136 2016 Chalk River Laboratories Total noble gases 8.50E+14 NRM | NRS
142 2016 Chalk River Laboratories Tritium (HTO) 2.45E+14 3.50E+13
134 2016 Chalk River Laboratories Tritium (HTO) 2.30E+14 3.50E+13
106 2017 Chalk River Laboratories Tritium (HTO) 2.53E+14 3.81E+13
100 2017 Chalk River Laboratories Tritium (HTO) 2.50E+14 3.81E+13
69 2018 Chalk River Laboratories Tritium (HTO) 2.34E+14 1.93E+13
65 2018 Chalk River Laboratories Tritium (HTO) 2.29E+14 1.93E+13
35 2019 Chalk River Laboratories Tritium (HTO) 2.01E+14 1.31E+13
33 2019 Chalk River Laboratories Tritium (HTO) 2.01E+14 1.37E+13
20 2020 Whiteshell Laboratories Estimated public dose (see footnote) 3.00E-06 NRM | NRS
18 2020 Whiteshell Laboratories Estimated public dose (see footnote) 0.00E+00 NRM | NRS
In [166]:
df_cnl_2021[(df_cnl_2021['Facility Name'] == 'Chalk River Laboratories') & (df_cnl_2021['Substance Name (English)'] == 'Iodine-125')]
Out[166]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
4 2020 Chalk River Laboratories Iodine-125 2.00E+06 NRM | NRS
38 2019 Chalk River Laboratories Iodine-125 2.44E+06 NRM | NRS
72 2018 Chalk River Laboratories Iodine-125 9.67E+07 NRM | NRS
109 2017 Chalk River Laboratories Iodine-125 5.30E+08 NRM | NRS
145 2016 Chalk River Laboratories Iodine-125 2.91E+08 NRM | NRS
176 2015 Chalk River Laboratories Iodine-125 3.44E+08 NRM | NRS
206 2014 Chalk River Laboratories Iodine-125 1.62E+08 NRM | NRS
236 2013 Chalk River Laboratories Iodine-125 2.84E+08 NRM | NRS
In [167]:
df_cnl_2020[(df_cnl_2020['Facility Name'] == 'Chalk River Laboratories') & (df_cnl_2020['Substance Name (English)'] == 'Iodine-125')]
Out[167]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge

(46) Where did the values for Chalk River Laboratories Iodine-125 come from? They were not there in 2020's dataframe.

In [168]:
df_cnl_2021[(df_cnl_2021['Facility Name'] == 'Chalk River Laboratories') & (df_cnl_2021['Substance Name (English)'] == 'Strontium-90')]
Out[168]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
9 2020 Chalk River Laboratories Strontium-90 NRM | NRS 7.07E+09
43 2019 Chalk River Laboratories Strontium-90 NRM | NRS 2.57E+09
77 2018 Chalk River Laboratories Strontium-90 NRM | NRS 8.72E+09
114 2017 Chalk River Laboratories Strontium-90 NRM | NRS 1.66E+10
151 2016 Chalk River Laboratories Strontium-90 NRM | NRS 7.30E+09
182 2015 Chalk River Laboratories Strontium-90 NRM | NRS 6.70E+10
212 2014 Chalk River Laboratories Strontium-90 NRM | NRS 2.26E+11
242 2013 Chalk River Laboratories Strontium-90 NRM | NRS 1.51E+10
In [169]:
df_cnl_2020[(df_cnl_2020['Facility Name'] == 'Chalk River Laboratories') & (df_cnl_2020['Substance Name (English)'] == 'Strontium-90')]
Out[169]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge

(47) Where did the values for Chalk River Laboratories Strontium-90 come from? They were not there in 2020's dataframe.

In [170]:
df[(df['Substance Name (English)'] != 'Iodine-125') & (df['Substance Name (English)'] != 'Strontium-90')]
Out[170]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
74 2018 Chalk River Laboratories Argon-41 2.64E+15 NRM | NRS
69 2018 Chalk River Laboratories Argon-41 2.59E+15 NRM | NRS
143 2016 Chalk River Laboratories Carbon-14 4.85E+11 NRM | NRS
135 2016 Chalk River Laboratories Carbon-14 4.84E+11 NRM | NRS
107 2017 Chalk River Laboratories Carbon-14 4.91E+11 NRM | NRS
101 2017 Chalk River Laboratories Carbon-14 4.90E+11 NRM | NRS
70 2018 Chalk River Laboratories Carbon-14 2.59E+11 NRM | NRS
66 2018 Chalk River Laboratories Carbon-14 2.54E+11 NRM | NRS
44 2019 Chalk River Laboratories Estimated public dose (see footnote) 0.0039 NRM | NRS
40 2019 Chalk River Laboratories Estimated public dose (see footnote) 0.0038 NRM | NRS
10 2020 Chalk River Laboratories Estimated public dose (see footnote) 0.0074 NRM | NRS
8 2020 Chalk River Laboratories Estimated public dose (see footnote) 0.0059 NRM | NRS
110 2017 Chalk River Laboratories Iodine-131 3.78E+08 NRM | NRS
103 2017 Chalk River Laboratories Iodine-131 3.82E+08 NRM | NRS
73 2018 Chalk River Laboratories Iodine-131 1.05E+08 NRM | NRS
68 2018 Chalk River Laboratories Iodine-131 1.02E+08 NRM | NRS
144 2016 Chalk River Laboratories Total noble gases 3.97E+14 NRM | NRS
136 2016 Chalk River Laboratories Total noble gases 8.50E+14 NRM | NRS
142 2016 Chalk River Laboratories Tritium (HTO) 2.45E+14 3.50E+13
134 2016 Chalk River Laboratories Tritium (HTO) 2.30E+14 3.50E+13
106 2017 Chalk River Laboratories Tritium (HTO) 2.53E+14 3.81E+13
100 2017 Chalk River Laboratories Tritium (HTO) 2.50E+14 3.81E+13
69 2018 Chalk River Laboratories Tritium (HTO) 2.34E+14 1.93E+13
65 2018 Chalk River Laboratories Tritium (HTO) 2.29E+14 1.93E+13
35 2019 Chalk River Laboratories Tritium (HTO) 2.01E+14 1.31E+13
33 2019 Chalk River Laboratories Tritium (HTO) 2.01E+14 1.37E+13
20 2020 Whiteshell Laboratories Estimated public dose (see footnote) 3.00E-06 NRM | NRS
18 2020 Whiteshell Laboratories Estimated public dose (see footnote) 0.00E+00 NRM | NRS

(48) Why did this 14 set of values changed between reports? Why wasn't it addressed somewhere?

  • Note: Almost all of the errors are in Chalk River Laboratories.

Uranium Mines and Mills¶

First Look at the Dataframe¶

In [171]:
df_umm = pd.read_csv("./Datasets/Uranium Mines and Mills.csv")
df_umm
Out[171]:
Year | Année NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Région économique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Français) Units | Unités Stack Emissions | Émissions de cheminées Direct Discharge | Évacuations directes Footnotes | Notes de bas de page Unnamed: 17 Unnamed: 18
0 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Uranium Uranium kg NRM | NRS 68.9 NaN NaN NaN
1 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Thorium-230 Thorium-230 MBq NRM | NRS 57.5 NaN NaN NaN
2 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Radium-226 Radium-226 MBq NRM | NRS 22.6 NaN NaN NaN
3 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Lead-210 Plomb-210 MBq NRM | NRS 76.6 NaN NaN NaN
4 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Polonium-210 Polonium-210 MBq NRM | NRS 43.1 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
220 2013 4866 Orano McClean Lake Saskatoon NaN NaN NaN SK 58.2611 -103.8025 Uranium Uranium kg NRM | NRS 1.8 NaN NaN NaN
221 2013 4866 Orano McClean Lake Saskatoon NaN NaN NaN SK 58.2611 -103.8025 Thorium-230 Thorium-230 MBq NRM | NRS 19.6 NaN NaN NaN
222 2013 4866 Orano McClean Lake Saskatoon NaN NaN NaN SK 58.2611 -103.8025 Radium-226 Radium-226 MBq NRM | NRS 6.0 NaN NaN NaN
223 2013 4866 Orano McClean Lake Saskatoon NaN NaN NaN SK 58.2611 -103.8025 Lead-210 Plomb-210 MBq NRM | NRS 74.4 NaN NaN NaN
224 2013 4866 Orano McClean Lake Saskatoon NaN NaN NaN SK 58.2611 -103.8025 Polonium-210 Polonium-210 MBq NRM | NRS 17.7 NaN NaN NaN

225 rows × 19 columns

(1) Why does the Data start in 2013? Can we get older data?

In [172]:
# I'm creating a copy because I will need it later.

df_umm_0 = df_umm.copy()
In [173]:
df_umm["Facility Name | Nom de l'installation"].unique()
Out[173]:
array(['Rabbit Lake', 'Key Lake', 'McArthur River', 'Cigar Lake',
       'McClean Lake'], dtype=object)
In [174]:
# All Stack Emissions are NRM (Not Required to Monitor). So I will focus this analysis on Direct Discharge only.

df_umm['Stack Emissions | Émissions de cheminées'].unique()
Out[174]:
array(['NRM | NRS'], dtype=object)

(2) Why don't they monitor & report Stack Emissions? Can you confirm that there isn't radionuclides stack emissions at all?

In [175]:
# Renaming columns to English only:

df_umm.rename(columns={'Year | Année': 'Year', 'NPRI ID | ID INRP': 'NPRI ID','Company Name | Raison Sociale':'Company Name',"Facility Name | Nom de l'installation":'Facility Name', 'City | Ville':'City', 'CSD | SDR':'CSD','CA or CMA | AR ou RMR':'CA or CMA', 'Economic Region | Région économique':'Economic Region','Province | Province':'Province', 'Latitude | Latitude':'Latitude', 'Longitude | Longitude':'Longitude','Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)','Units | Unités':'Units', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge','Footnotes | Notes de bas de page':'Footnotes'}, inplace = True)
df_umm.head()
Out[175]:
Year NPRI ID Company Name Facility Name City CSD CA or CMA Economic Region Province Latitude Longitude Substance Name (English) Substance Name (French) | Nom de substance (Français) Units Stack Emissions Direct Discharge Footnotes Unnamed: 17 Unnamed: 18
0 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Uranium Uranium kg NRM | NRS 68.9 NaN NaN NaN
1 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Thorium-230 Thorium-230 MBq NRM | NRS 57.5 NaN NaN NaN
2 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Radium-226 Radium-226 MBq NRM | NRS 22.6 NaN NaN NaN
3 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Lead-210 Plomb-210 MBq NRM | NRS 76.6 NaN NaN NaN
4 2021 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Polonium-210 Polonium-210 MBq NRM | NRS 43.1 NaN NaN NaN
In [176]:
# After checking Direct Discharge, I noticed there is a value "DL" & a 0 value:

df_umm[(df_umm['Direct Discharge'] == 'DL') | (df_umm['Direct Discharge'] == '0.0')]
Out[176]:
Year NPRI ID Company Name Facility Name City CSD CA or CMA Economic Region Province Latitude Longitude Substance Name (English) Substance Name (French) | Nom de substance (Français) Units Stack Emissions Direct Discharge Footnotes Unnamed: 17 Unnamed: 18
15 2021 19397 Cameco Cigar Lake Saskatoon NaN NaN NaN SK 58.0686 -104.5406 Uranium Uranium kg NRM | NRS 0.0 NaN NaN NaN
203 2013 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Lead-210 Plomb-210 MBq NRM | NRS DL Loadings were less than the detection limit | ... NaN NaN

(3) Summary of Special Values in Direct Discharge:

  • Cigar Lake: Uranium, Year 2021: 0 - Reported 0 kg of Direct Discharge. Why is that? (see below Uranium graph - It was going down to zero for several years: 2018, 2019: 0.2 kg; 2020: 0.1 kg).
  • Rabbit Lake: Lead-210, Year 2013: "DL" (footnotes: Loadings were less than the detection limit).

Creating Geographic Reference Table¶

In [177]:
# Cleaning the geography dataframe:

df_umm_geography = df_umm[['Facility Name', 'NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude']]
df_umm_geography.drop_duplicates(inplace=True)
df_umm_geography = df_umm_geography.reset_index(drop=True)
df_umm_geography
Out[177]:
Facility Name NPRI ID Company Name City CSD CA or CMA Economic Region Province Latitude Longitude
0 Rabbit Lake 1147 Cameco Saskatoon NaN NaN NaN SK 58.1978 -103.7136
1 Key Lake 1148 Cameco Saskatoon NaN NaN NaN SK 57.2067 -105.6592
2 McArthur River 1149 Cameco Saskatoon NaN NaN NaN SK 57.7625 -105.0519
3 Cigar Lake 19397 Cameco Saskatoon NaN NaN NaN SK 58.0686 -104.5406
4 McClean Lake 4866 Orano Saskatoon NaN NaN NaN SK 58.2611 -103.8025

Cleaning of the Dataframe¶

In [178]:
# Cleaning the DL value so I can convert the column into numeric for later plotting:

df_umm['Direct Discharge'].replace('DL', 0, inplace=True)
In [179]:
# Converted columns to numeric for plotting:

df_umm['Direct Discharge'] = pd.to_numeric(df_umm['Direct Discharge'])
In [180]:
df_umm.drop(columns=['NPRI ID', 'Company Name', 'City', 'CSD', 'CA or CMA', 'Economic Region', 'Province', 'Latitude', 'Longitude', 'Substance Name (French) | Nom de substance (Français)', 'Stack Emissions', 'Footnotes', 'Unnamed: 17', 'Unnamed: 18'], inplace=True)
df_umm.head()
Out[180]:
Year Facility Name Substance Name (English) Units Direct Discharge
0 2021 Rabbit Lake Uranium kg 68.9
1 2021 Rabbit Lake Thorium-230 MBq 57.5
2 2021 Rabbit Lake Radium-226 MBq 22.6
3 2021 Rabbit Lake Lead-210 MBq 76.6
4 2021 Rabbit Lake Polonium-210 MBq 43.1
In [181]:
# First, I'm saving the clean dataframe to do a dashboard in Tableau.

df_umm.to_csv(".\Datasets\df_umm.csv", index=True, header=True)

Plotting by Substance¶

Uranium¶

In [182]:
df_umm_uranium = df_umm[df_umm['Substance Name (English)'] == 'Uranium']
df_umm_uranium.head()
Out[182]:
Year Facility Name Substance Name (English) Units Direct Discharge
0 2021 Rabbit Lake Uranium kg 68.9
5 2021 Key Lake Uranium kg 49.1
10 2021 McArthur River Uranium kg 18.5
15 2021 Cigar Lake Uranium kg 0.0
20 2021 McClean Lake Uranium kg 10.2
In [183]:
plt.figure(figsize=(16,6))

year = df_umm_uranium['Year'].unique()

for facility in df_umm_uranium['Facility Name'].unique():
    plt.plot(year, df_umm_uranium[df_umm_uranium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Direct Discharge [kg]', size=12)
plt.legend(df_umm_uranium['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(4) Why does Rabbit Lake produces so much more than the rest & why the peaks in 2013 & 2016?

In [184]:
# plotting without Rabbit Lake to have a better look at the rest.

df_umm_uranium_2 = df_umm_uranium[df_umm_uranium['Facility Name'] != 'Rabbit Lake']

plt.figure(figsize=(16,6))

year = df_umm_uranium_2['Year'].unique()

for facility in df_umm_uranium_2['Facility Name'].unique():
    plt.plot(year, df_umm_uranium_2[df_umm_uranium_2['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Uranium - Direct Discharge [kg]', size=12)
plt.legend(df_umm_uranium_2['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(5) Why does Key Lake have a spike from 2018 to 2021, with peaks in 2019 & 2021?

(6) Why does Cigar Lake have a peak in 2015? (from 6.6 to 38 & then back to 2.4)

Thorium-230¶

In [185]:
df_umm_thorium = df_umm[df_umm['Substance Name (English)'] == 'Thorium-230']
df_umm_thorium.head()
Out[185]:
Year Facility Name Substance Name (English) Units Direct Discharge
1 2021 Rabbit Lake Thorium-230 MBq 57.5
6 2021 Key Lake Thorium-230 MBq 23.7
11 2021 McArthur River Thorium-230 MBq 26.2
16 2021 Cigar Lake Thorium-230 MBq 3.5
21 2021 McClean Lake Thorium-230 MBq 18.1
In [186]:
plt.figure(figsize=(16,6))

year = df_umm_thorium['Year'].unique()

for facility in df_umm_thorium['Facility Name'].unique():
    plt.plot(year, df_umm_thorium[df_umm_thorium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Thorium-230 - Direct Discharge [MBq]', size=12)
plt.legend(df_umm_thorium['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(7) Why does Rabbit Lake have a peak in 2017?

(8) Why does Key Lake have a spike from 2015 to 2017 & a peak in 2020?

Radium-226¶

In [187]:
df_umm_radium = df_umm[df_umm['Substance Name (English)'] == 'Radium-226']
df_umm_radium.head()
Out[187]:
Year Facility Name Substance Name (English) Units Direct Discharge
2 2021 Rabbit Lake Radium-226 MBq 22.6
7 2021 Key Lake Radium-226 MBq 42.1
12 2021 McArthur River Radium-226 MBq 106.7
17 2021 Cigar Lake Radium-226 MBq 2.3
22 2021 McClean Lake Radium-226 MBq 17.8
In [188]:
plt.figure(figsize=(16,6))

year = df_umm_radium['Year'].unique()

for facility in df_umm_radium['Facility Name'].unique():
    plt.plot(year, df_umm_radium[df_umm_radium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Radium-226 - Direct Discharge [MBq]', size=12)
plt.legend(df_umm_radium['Facility Name'].unique(), loc='upper right')
plt.grid()

plt.show()

(9) Why does McArthur River have significantly more discharge than the rest?

(10) Why does Key Lake have a peak in 2019?

Lead-210¶

In [189]:
df_umm_lead = df_umm[df_umm['Substance Name (English)'] == 'Lead-210']
df_umm_lead.head()
Out[189]:
Year Facility Name Substance Name (English) Units Direct Discharge
3 2021 Rabbit Lake Lead-210 MBq 76.6
8 2021 Key Lake Lead-210 MBq 27.2
13 2021 McArthur River Lead-210 MBq 56.1
18 2021 Cigar Lake Lead-210 MBq 7.7
23 2021 McClean Lake Lead-210 MBq 34.3
In [190]:
plt.figure(figsize=(16,6))

year = df_umm_lead['Year'].unique()

for facility in df_umm_lead['Facility Name'].unique():
    plt.plot(year, df_umm_lead[df_umm_lead['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Lead-210 - Direct Discharge [MBq]', size=12)
plt.legend(df_umm_lead['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(11) Why does Rabbit Lake releases significantly more than the rest & have a spike from 2015 to 2019, with peaks in 2015, 2016 & 2018?

Polonium-210¶

In [191]:
df_umm_polonium = df_umm[df_umm['Substance Name (English)'] == 'Polonium-210']
df_umm_polonium.head()
Out[191]:
Year Facility Name Substance Name (English) Units Direct Discharge
4 2021 Rabbit Lake Polonium-210 MBq 43.1
9 2021 Key Lake Polonium-210 MBq 35.5
14 2021 McArthur River Polonium-210 MBq 14.7
19 2021 Cigar Lake Polonium-210 MBq 3.5
24 2021 McClean Lake Polonium-210 MBq 20.2
In [192]:
plt.figure(figsize=(16,6))

year = df_umm_polonium['Year'].unique()

for facility in df_umm_polonium['Facility Name'].unique():
    plt.plot(year, df_umm_polonium[df_umm_polonium['Facility Name'] == facility]['Direct Discharge'])
plt.xticks(year)
plt.xlabel('Year', size=16)
plt.ylabel('Polonium-210 - Direct Discharge [MBq]', size=12)
plt.legend(df_umm_polonium['Facility Name'].unique(), loc='upper left')
plt.grid()

plt.show()

(12) Why does McArthur River have a peak in 2015?

(13) Why does Rabbit Lake have a peak in 2013?

(14) Why does Key Lake have a peak in 2014?

(15) Why does McClean Lake have a peak in 2016?

Individual Plotting by Substance & Facility¶

In [193]:
facilities = df_umm['Facility Name'].unique()

for f in facilities:
    df = df_umm[df_umm['Facility Name'] == f]
    print(f,'\n')
    subs = df['Substance Name (English)'].unique()
    for s in subs:
        df2 = df[df['Substance Name (English)'] == s]
        plt.figure(figsize=(12,4))
        plt.plot(df2['Year'], df2['Direct Discharge'], color='red')
        plt.title(s + ' - Direct Discharge', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        plt.xlabel('Year', fontdict={'fontsize': 14, 'fontweight': 'medium'})
        plt.grid()
    
        plt.show()
Rabbit Lake 

Key Lake 

McArthur River 

Cigar Lake 

McClean Lake 

Comparison with 2020 Dataframe¶

In [194]:
df_umm_2020 = pd.read_csv("./Datasets/2020/Uranium Mines and Mills.csv")
df_umm_2020.head()
Out[194]:
_id Year | Annee NPRI ID | ID INRP Company Name | Raison Sociale Facility Name | Nom de l'installation City | Ville CSD | SDR CA or CMA | AR ou RMR Economic Region | Region economique Province | Province Latitude | Latitude Longitude | Longitude Substance Name (English) | Nom de substance (Anglais) Substance Name (French) | Nom de substance (Francais) Units | Unites Stack Emissions | Emissions de cheminees Direct Discharge | Evacuations directes Footnotes | Notes de bas de page
0 1 2020 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Uranium Uranium kg NRM | NRS 80.3 NaN
1 2 2020 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Thorium-230 Thorium-230 MBq NRM | NRS 75.6 NaN
2 3 2020 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Radium-226 Radium-226 MBq NRM | NRS 24.0 NaN
3 4 2020 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Lead-210 Plomb-210 MBq NRM | NRS 75.6 NaN
4 5 2020 1147 Cameco Rabbit Lake Saskatoon NaN NaN NaN SK 58.1978 -103.7136 Polonium-210 Polonium-210 MBq NRM | NRS 32.1 NaN
In [195]:
# I will leave only the essential columns.

df_umm_2021 = df_umm_0[['Year | Année', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Émissions de cheminées',
       'Direct Discharge | Évacuations directes']]
df_umm_2021.rename(columns={'Year | Année': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Émissions de cheminées':'Stack Emissions','Direct Discharge | Évacuations directes':'Direct Discharge'}, inplace = True)
df_umm_2021.head()
Out[195]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2021 Rabbit Lake Uranium NRM | NRS 68.9
1 2021 Rabbit Lake Thorium-230 NRM | NRS 57.5
2 2021 Rabbit Lake Radium-226 NRM | NRS 22.6
3 2021 Rabbit Lake Lead-210 NRM | NRS 76.6
4 2021 Rabbit Lake Polonium-210 NRM | NRS 43.1
In [196]:
# I will remove 2021 from the new dataframe and compare the remaining with 2020.

df_umm_2021 = df_umm_2021[df_umm_2021['Year'] != 2021].reset_index(drop = True)
df_umm_2021.head()
Out[196]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2020 Rabbit Lake Uranium NRM | NRS 80.3
1 2020 Rabbit Lake Thorium-230 NRM | NRS 75.6
2 2020 Rabbit Lake Radium-226 NRM | NRS 24.0
3 2020 Rabbit Lake Lead-210 NRM | NRS 75.6
4 2020 Rabbit Lake Polonium-210 NRM | NRS 32.1
In [197]:
# I will leave only the essential columns for 2020 too.

df_umm_2020 = df_umm_2020[['Year | Annee', "Facility Name | Nom de l'installation", 'Substance Name (English) | Nom de substance (Anglais)', 'Stack Emissions | Emissions de cheminees',
       'Direct Discharge | Evacuations directes']]
df_umm_2020.rename(columns={'Year | Annee': 'Year', "Facility Name | Nom de l'installation":'Facility Name', 'Substance Name (English) | Nom de substance (Anglais)':'Substance Name (English)', 'Stack Emissions | Emissions de cheminees':'Stack Emissions','Direct Discharge | Evacuations directes':'Direct Discharge'}, inplace = True)
df_umm_2020.head()
Out[197]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
0 2020 Rabbit Lake Uranium NRM | NRS 80.3
1 2020 Rabbit Lake Thorium-230 NRM | NRS 75.6
2 2020 Rabbit Lake Radium-226 NRM | NRS 24.0
3 2020 Rabbit Lake Lead-210 NRM | NRS 75.6
4 2020 Rabbit Lake Polonium-210 NRM | NRS 32.1
In [198]:
# I will concatenate both dataframes & keep the not duplicates to see the differences. This produces a df with 2021's values first, and 2020's values after:

df = pd.concat([df_umm_2021,df_umm_2020]).drop_duplicates(keep=False)
df = df.sort_values(by=['Facility Name', 'Substance Name (English)', 'Year'])
df

# No errors at all.
Out[198]:
Year Facility Name Substance Name (English) Stack Emissions Direct Discharge
  • No changes between the files.